Setting the Scene Brazil is the world’s fifth-largest country by area and the sixth most populor. Brazil is classified as an upper-middle income economy by the World Bank and a developing country, with the largest share of global wealth in Latin America. It is considered an advanced emerging economy. It has the ninth largest GDP in the world by nominal, and eighth by PPP measures. Behind all this impressive figures, the spatial development of Brazil is highly unequal as shown in the figure on the right. In 2013, the GDP per capita of the poorest municipality is R$3,190.6. On the other hand, the GDP per capita of the richest municipality is R$314,638. Half of the municipalities have GDP per capita less than R$16,000 and the top 25% municipalities earn R$26,155 and above.
The specific tasks of the analysis are as follows:
For the purpose of this assignments, these two datasets are used.
Firstly, before we get started, it is important for us to install the necessary R packages into R and launch these R packages into R environment.
These are the R packages used for this take home assignment 3.
packages = c('sf','rgdal', 'spdep', 'tidyverse', 'readr', 'dplyr', 'tmap', 'corrplot', 'ggpubr', 'heatmaply', 'cluster', 'factoextra', 'NbClust', 'psych', "geobr")
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}## Loading required package: sf
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
## Loading required package: rgdal
## Loading required package: sp
## rgdal: version: 1.5-16, (SVN revision 1050)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.0.4, released 2020/01/28
## Path to GDAL shared files: C:/Users/THIN SU YEE/Documents/R/win-library/4.0/rgdal/gdal
## GDAL binary built with GEOS: TRUE
## Loaded PROJ runtime: Rel. 6.3.1, February 10th, 2020, [PJ_VERSION: 631]
## Path to PROJ shared files: C:/Users/THIN SU YEE/Documents/R/win-library/4.0/rgdal/proj
## Linking to sp version:1.4-2
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading rgdal.
## Loading required package: spdep
## Loading required package: spData
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`
## Loading required package: tidyverse
## -- Attaching packages -------------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.3 v dplyr 1.0.2
## v tidyr 1.1.1 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## -- Conflicts ----------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Loading required package: tmap
## Loading required package: corrplot
## Warning: package 'corrplot' was built under R version 4.0.3
## corrplot 0.84 loaded
## Loading required package: ggpubr
## Warning: package 'ggpubr' was built under R version 4.0.3
## Loading required package: heatmaply
## Warning: package 'heatmaply' was built under R version 4.0.3
## Loading required package: plotly
## Warning: package 'plotly' was built under R version 4.0.3
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
## Loading required package: viridis
## Loading required package: viridisLite
##
## ======================
## Welcome to heatmaply version 1.1.1
##
## Type citation('heatmaply') for how to cite the package.
## Type ?heatmaply for the main documentation.
##
## The github page is: https://github.com/talgalili/heatmaply/
## Please submit your suggestions and bug-reports at: https://github.com/talgalili/heatmaply/issues
## Or contact: <tal.galili@gmail.com>
## ======================
## Loading required package: cluster
## Loading required package: factoextra
## Warning: package 'factoextra' was built under R version 4.0.3
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
## Loading required package: NbClust
## Warning: package 'NbClust' was built under R version 4.0.3
## Loading required package: psych
## Warning: package 'psych' was built under R version 4.0.3
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
## Loading required package: geobr
## Warning: package 'geobr' was built under R version 4.0.3
This code chunk imports BRAZIL_CITIES.csv into R by using read.csv2() and read_delim since csv file is using a semicolon as delimiter.
brazil_cities <- read.csv2("data/aspatial/BRAZIL_CITIES.csv", encoding = 'utf-8')
brazil_data <- read_delim("data/aspatial/BRAZIL_CITIES.csv", delim = ";")## Parsed with column specification:
## cols(
## .default = col_double(),
## CITY = col_character(),
## STATE = col_character(),
## AREA = col_number(),
## REGIAO_TUR = col_character(),
## CATEGORIA_TUR = col_character(),
## RURAL_URBAN = col_character(),
## GVA_MAIN = col_character()
## )
## See spec(...) for full column specifications.
This code chunk checks the content of imported brazil_cities data.
## CITY STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BRAS
## 1 Abadia De Goiás GO 0 6876 6876
## 2 Abadia Dos Dourados MG 0 6704 6704
## 3 Abadiânia GO 0 15757 15609
## 4 Abaeté MG 0 22690 22690
## 5 Abaetetuba PA 0 141100 141040
## IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL IBGE_POP IBGE_1
## 1 0 2137 1546 591 5300 69
## 2 0 2328 1481 847 4154 38
## 3 148 4655 3233 1422 10656 139
## 4 0 7694 6667 1027 18464 176
## 5 60 31061 19057 12004 82956 1354
## IBGE_1.4 IBGE_5.9 IBGE_10.14 IBGE_15.59 IBGE_60. IBGE_PLANTED_AREA
## 1 318 438 517 3542 416 319
## 2 207 260 351 2709 589 4479
## 3 650 894 1087 6896 990 10307
## 4 856 1233 1539 11979 2681 1862
## 5 5567 7618 8905 53516 5996 25200
## IBGE_CROP_PRODUCTION_. IDHM.Ranking.2010 IDHM IDHM_Renda IDHM_Longevidade
## 1 1843 1689 0.708 0.687 0.83
## 2 18017 2207 0.69 0.693 0.839
## 3 33085 2202 0.69 0.671 0.841
## 4 7502 1994 0.698 0.72 0.848
## 5 700872 3530 0.628 0.579 0.798
## IDHM_Educacao LONG LAT ALT PAY_TV FIXED_PHONES AREA
## 1 0.622 -49.44054783 -16.75881189 893.6 360 842 147.26
## 2 0.563 -47.39683244 -18.48756496 753.12 77 296 881.06
## 3 0.579 -48.71881214 -16.18267186 1017.55 227 720 1,045.13
## 4 0.556 -45.44619142 -19.15584769 644.74 1230 1716 1,817.07
## 5 0.537 -48.88440382 -1.723469863 10.12 3389 1218 1,610.65
## REGIAO_TUR CATEGORIA_TUR ESTIMATED_POP
## 1 8583
## 2 Caminhos Do Cerrado D 6972
## 3 Região TurÃstica Do Ouro E Cristais C 19614
## 4 Lago De Três Marias D 23223
## 5 Araguaia-Tocantins D 156292
## RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES GVA_PUBLIC GVA_TOTAL
## 1 Urbano 6.2 27991.25 74750.32 36915.04 145857.6
## 2 Rural Adjacente 50524.57 25917.7 62689.23 28083.79 167215.28
## 3 Rural Adjacente 42.84 16728.3 138198.58 63396.2 261161.91
## 4 Urbano 113824.6 31002.62 172.33 86081.41 403241.27
## 5 Urbano 140463.72 58610 468128.69 486872.4 1154074.81
## TAXES GDP POP_GDP GDP_CAPITA
## 1 20554.2 166.41 8053 20664.57
## 2 12873.5 180.09 7037 25591.7
## 3 26822.58 287984.49 18427 15628.4
## 4 26994.09 430235.36 23574 18250.42
## 5 95180.48 1249255.29 151934 8222.36
## GVA_MAIN
## 1 Demais serviços
## 2 Demais serviços
## 3 Demais serviços
## 4 Demais serviços
## 5 Administração, defesa, educação e saúde públicas e seguridade social
## MUN_EXPENDIT COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G COMP_H
## 1 28227691 284 5 1 56 0 2 29 110 26
## 2 17909274 476 6 6 30 1 2 34 190 70
## 3 37513019 288 5 9 26 0 2 7 117 12
## 4 NA 621 18 1 40 0 1 20 303 62
## 5 NA 931 4 2 43 0 1 27 500 16
## COMP_I COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R COMP_S
## 1 4 5 0 2 10 12 4 6 6 1 5
## 2 28 11 0 4 15 29 2 9 14 6 19
## 3 57 2 1 0 7 15 3 11 5 1 8
## 4 30 9 6 4 28 27 2 15 19 9 27
## 5 31 6 1 1 22 16 2 155 33 15 56
## COMP_T COMP_U HOTELS BEDS Pr_Agencies Pu_Agencies Pr_Bank Pu_Bank Pr_Assets
## 1 0 0 NA NA NA NA NA NA NA
## 2 0 0 NA NA NA NA NA NA NA
## 3 0 0 1 34 1 1 1 1 33724584
## 4 0 0 NA NA 2 2 2 2 44974716
## 5 0 0 NA NA 2 4 2 4 76181384
## Pu_Assets Cars Motorcycles Wheeled_tractor UBER MAC WAL.MART POST_OFFICES
## 1 NA 2158 1246 0 NA NA NA 1
## 2 NA 2227 1142 0 NA NA NA 1
## 3 67091904 2838 1426 0 NA NA NA 3
## 4 371922572 6928 2953 0 NA NA NA 4
## 5 800078483 5277 25661 0 NA NA NA 2
## # A tibble: 5 x 81
## CITY STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~ IBGE_DU
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Abad~ GO 0 6876 6876 0 2137
## 2 Abad~ MG 0 6704 6704 0 2328
## 3 Abad~ GO 0 15757 15609 148 4655
## 4 Abae~ MG 0 22690 22690 0 7694
## 5 Abae~ PA 0 141100 141040 60 31061
## # ... with 74 more variables: IBGE_DU_URBAN <dbl>, IBGE_DU_RURAL <dbl>,
## # IBGE_POP <dbl>, IBGE_1 <dbl>, `IBGE_1-4` <dbl>, `IBGE_5-9` <dbl>,
## # `IBGE_10-14` <dbl>, `IBGE_15-59` <dbl>, `IBGE_60+` <dbl>,
## # IBGE_PLANTED_AREA <dbl>, `IBGE_CROP_PRODUCTION_$` <dbl>, `IDHM Ranking
## # 2010` <dbl>, IDHM <dbl>, IDHM_Renda <dbl>, IDHM_Longevidade <dbl>,
## # IDHM_Educacao <dbl>, LONG <dbl>, LAT <dbl>, ALT <dbl>, PAY_TV <dbl>,
## # FIXED_PHONES <dbl>, AREA <dbl>, REGIAO_TUR <chr>, CATEGORIA_TUR <chr>,
## # ESTIMATED_POP <dbl>, RURAL_URBAN <chr>, GVA_AGROPEC <dbl>,
## # GVA_INDUSTRY <dbl>, GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, ` GVA_TOTAL
## # ` <dbl>, TAXES <dbl>, GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>,
## # GVA_MAIN <chr>, MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>,
## # COMP_B <dbl>, COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>,
## # COMP_G <dbl>, COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>,
## # COMP_L <dbl>, COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>,
## # COMP_Q <dbl>, COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>,
## # HOTELS <dbl>, BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>,
## # Pr_Bank <dbl>, Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>, Cars <dbl>,
## # Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <dbl>, MAC <dbl>,
## # `WAL-MART` <dbl>, POST_OFFICES <dbl>
Next, this code chunks imports Data_Dictionary.csv into R by using read.csv2() since the file contains “;” as delimiter.
This code chunk checks the content of imported data_dictionary which contains the metadata of the columns in brazil_cities data.
## ï..FIELD DESCRIPTION REFERENCE UNIT
## 1 CITY Name of the City
## 2 STATE Name of the State
## 3 CAPITAL 1 if Capital of State
## 4 IBGE_RES_POP Resident Population 2010 -
## 5 IBGE_RES_POP_BRAS Resident Population Brazilian 2010 -
## SOURCE X
## 1 -
## 2 -
## 3 -
## 4 https://sidra.ibge.gov.br/tabela/1497
## 5 https://sidra.ibge.gov.br/tabela/1497
This code uses read_municipality() of geobr package to obtain a copy of 2016 municipality boundary file.
## Using year 2016
## Loading data for the whole country. This might take a few minutes.
##
|
| | 0%
|
|=== | 4%
|
|===== | 7%
|
|======== | 11%
|
|========== | 15%
|
|============= | 19%
|
|================ | 22%
|
|================== | 26%
|
|===================== | 30%
|
|======================= | 33%
|
|========================== | 37%
|
|============================= | 41%
|
|=============================== | 44%
|
|================================== | 48%
|
|==================================== | 52%
|
|======================================= | 56%
|
|========================================= | 59%
|
|============================================ | 63%
|
|=============================================== | 67%
|
|================================================= | 70%
|
|==================================================== | 74%
|
|====================================================== | 78%
|
|========================================================= | 81%
|
|============================================================ | 85%
|
|============================================================== | 89%
|
|================================================================= | 93%
|
|=================================================================== | 96%
|
|======================================================================| 100%
## Rows: 5,572
## Columns: 5
## $ code_muni <dbl> 1100015, 1100023, 1100031, 1100049, 1100056, 1100064, ...
## $ name_muni <chr> "Alta Floresta D'oeste", "Ariquemes", "Cabixi", "Cacoa...
## $ code_state <chr> "11", "11", "11", "11", "11", "11", "11", "11", "11", ...
## $ abbrev_state <chr> "RO", "RO", "RO", "RO", "RO", "RO", "RO", "RO", "RO", ...
## $ geom <MULTIPOLYGON [°]> MULTIPOLYGON (((-62.19465 -..., MULTIPOLY...
Observation: Based on the downloaded data, we can see that there are 5572 observations in this municipal boundary file. But not all of those are the interest of the study. Therefore, it is required to filter out unnecessary data and only take in the data that are needed to carry out the analysis of the study.
This code chunks uses read_metro_area() of geobr package to obtain a copy of the metropolitan boundary file.
## Using year 2016
##
|
| | 0%
|
|======================================================================| 100%
##
Downloading: 770 B
Downloading: 770 B
Downloading: 1.9 kB
Downloading: 1.9 kB
Downloading: 2.5 kB
Downloading: 2.5 kB
Downloading: 2.5 kB
Downloading: 2.5 kB
Downloading: 12 kB
Downloading: 12 kB
Downloading: 29 kB
Downloading: 29 kB
Downloading: 29 kB
Downloading: 29 kB
Downloading: 29 kB
Downloading: 29 kB
Downloading: 45 kB
Downloading: 45 kB
Downloading: 45 kB
Downloading: 45 kB
Downloading: 61 kB
Downloading: 61 kB
Downloading: 69 kB
Downloading: 69 kB
Downloading: 85 kB
Downloading: 85 kB
Downloading: 85 kB
Downloading: 85 kB
Downloading: 85 kB
Downloading: 85 kB
Downloading: 93 kB
Downloading: 93 kB
Downloading: 110 kB
Downloading: 110 kB
Downloading: 130 kB
Downloading: 130 kB
Downloading: 130 kB
Downloading: 130 kB
Downloading: 130 kB
Downloading: 130 kB
Downloading: 150 kB
Downloading: 150 kB
Downloading: 160 kB
Downloading: 160 kB
Downloading: 170 kB
Downloading: 170 kB
Downloading: 180 kB
Downloading: 180 kB
Downloading: 200 kB
Downloading: 200 kB
Downloading: 200 kB
Downloading: 200 kB
Downloading: 210 kB
Downloading: 210 kB
Downloading: 210 kB
Downloading: 210 kB
Downloading: 230 kB
Downloading: 230 kB
Downloading: 250 kB
Downloading: 250 kB
Downloading: 260 kB
Downloading: 260 kB
Downloading: 280 kB
Downloading: 280 kB
Downloading: 300 kB
Downloading: 300 kB
Downloading: 310 kB
Downloading: 310 kB
Downloading: 330 kB
Downloading: 330 kB
Downloading: 340 kB
Downloading: 340 kB
Downloading: 340 kB
Downloading: 340 kB
Downloading: 350 kB
Downloading: 350 kB
Downloading: 370 kB
Downloading: 370 kB
Downloading: 380 kB
Downloading: 380 kB
Downloading: 400 kB
Downloading: 400 kB
Downloading: 420 kB
Downloading: 420 kB
Downloading: 430 kB
Downloading: 430 kB
Downloading: 450 kB
Downloading: 450 kB
Downloading: 470 kB
Downloading: 470 kB
Downloading: 470 kB
Downloading: 470 kB
Downloading: 470 kB
Downloading: 470 kB
Downloading: 470 kB
Downloading: 470 kB
Downloading: 490 kB
Downloading: 490 kB
Downloading: 510 kB
Downloading: 510 kB
Downloading: 520 kB
Downloading: 520 kB
Downloading: 540 kB
Downloading: 540 kB
Downloading: 550 kB
Downloading: 550 kB
Downloading: 570 kB
Downloading: 570 kB
Downloading: 590 kB
Downloading: 590 kB
Downloading: 600 kB
Downloading: 600 kB
Downloading: 620 kB
Downloading: 620 kB
Downloading: 640 kB
Downloading: 640 kB
Downloading: 650 kB
Downloading: 650 kB
Downloading: 670 kB
Downloading: 670 kB
Downloading: 680 kB
Downloading: 680 kB
Downloading: 700 kB
Downloading: 700 kB
Downloading: 720 kB
Downloading: 720 kB
Downloading: 720 kB
Downloading: 720 kB
Downloading: 720 kB
Downloading: 720 kB
Downloading: 730 kB
Downloading: 730 kB
Downloading: 750 kB
Downloading: 750 kB
Downloading: 770 kB
Downloading: 770 kB
Downloading: 780 kB
Downloading: 780 kB
Downloading: 800 kB
Downloading: 800 kB
Downloading: 810 kB
Downloading: 810 kB
Downloading: 810 kB
Downloading: 810 kB
Downloading: 830 kB
Downloading: 830 kB
Downloading: 830 kB
Downloading: 830 kB
Downloading: 830 kB
Downloading: 830 kB
Downloading: 830 kB
Downloading: 830 kB
Downloading: 840 kB
Downloading: 840 kB
Downloading: 850 kB
Downloading: 850 kB
Downloading: 870 kB
Downloading: 870 kB
Downloading: 890 kB
Downloading: 890 kB
Downloading: 890 kB
Downloading: 890 kB
Downloading: 900 kB
Downloading: 900 kB
Downloading: 910 kB
Downloading: 910 kB
Downloading: 920 kB
Downloading: 920 kB
Downloading: 940 kB
Downloading: 940 kB
Downloading: 940 kB
Downloading: 940 kB
Downloading: 940 kB
Downloading: 940 kB
Downloading: 940 kB
Downloading: 940 kB
Downloading: 940 kB
Downloading: 940 kB
Downloading: 950 kB
Downloading: 950 kB
Downloading: 960 kB
Downloading: 960 kB
Downloading: 970 kB
Downloading: 970 kB
Downloading: 980 kB
Downloading: 980 kB
Downloading: 1 MB
Downloading: 1 MB
Downloading: 1 MB
Downloading: 1 MB
Downloading: 1 MB
Downloading: 1 MB
Downloading: 1 MB
Downloading: 1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.1 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.2 MB
Downloading: 1.3 MB
Downloading: 1.3 MB
Downloading: 1.3 MB
Downloading: 1.3 MB
Downloading: 1.3 MB
Downloading: 1.3 MB
Downloading: 1.3 MB
Downloading: 1.3 MB
Downloading: 1.3 MB
Downloading: 1.3 MB
Downloading: 1.3 MB
Downloading: 1.3 MB
Downloading: 1.3 MB
Downloading: 1.3 MB
Downloading: 1.3 MB
Downloading: 1.3 MB
Downloading: 1.3 MB
Downloading: 1.3 MB
Downloading: 1.3 MB
Downloading: 1.3 MB
Downloading: 1.4 MB
Downloading: 1.4 MB
Downloading: 1.4 MB
Downloading: 1.4 MB
Downloading: 1.4 MB
Downloading: 1.4 MB
Downloading: 1.4 MB
Downloading: 1.4 MB
Downloading: 1.4 MB
Downloading: 1.4 MB
Downloading: 1.4 MB
Downloading: 1.4 MB
Downloading: 1.4 MB
Downloading: 1.4 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.5 MB
Downloading: 1.6 MB
Downloading: 1.6 MB
Downloading: 1.6 MB
Downloading: 1.6 MB
Downloading: 1.6 MB
Downloading: 1.6 MB
Downloading: 1.6 MB
Downloading: 1.6 MB
Downloading: 1.6 MB
Downloading: 1.6 MB
Downloading: 1.6 MB
Downloading: 1.6 MB
Downloading: 1.6 MB
Downloading: 1.6 MB
Downloading: 1.6 MB
Downloading: 1.6 MB
Downloading: 1.6 MB
Downloading: 1.6 MB
Downloading: 1.6 MB
Downloading: 1.6 MB
Downloading: 1.6 MB
Downloading: 1.6 MB
Downloading: 1.7 MB
Downloading: 1.7 MB
Downloading: 1.7 MB
Downloading: 1.7 MB
Downloading: 1.7 MB
Downloading: 1.7 MB
Downloading: 1.7 MB
Downloading: 1.7 MB
Downloading: 1.7 MB
Downloading: 1.7 MB
Downloading: 1.7 MB
Downloading: 1.7 MB
Downloading: 1.7 MB
Downloading: 1.7 MB
Downloading: 1.8 MB
Downloading: 1.8 MB
Downloading: 1.8 MB
Downloading: 1.8 MB
Downloading: 1.8 MB
Downloading: 1.8 MB
Downloading: 1.8 MB
Downloading: 1.8 MB
Downloading: 1.8 MB
Downloading: 1.8 MB
Downloading: 1.8 MB
Downloading: 1.8 MB
Downloading: 1.8 MB
Downloading: 1.8 MB
Downloading: 1.8 MB
Downloading: 1.8 MB
Downloading: 1.8 MB
Downloading: 1.8 MB
Downloading: 1.8 MB
Downloading: 1.8 MB
Downloading: 1.9 MB
Downloading: 1.9 MB
Downloading: 1.9 MB
Downloading: 1.9 MB
Downloading: 1.9 MB
Downloading: 1.9 MB
Downloading: 1.9 MB
Downloading: 1.9 MB
Downloading: 1.9 MB
Downloading: 1.9 MB
Downloading: 1.9 MB
Downloading: 1.9 MB
Downloading: 1.9 MB
Downloading: 1.9 MB
Downloading: 1.9 MB
Downloading: 1.9 MB
Downloading: 1.9 MB
Downloading: 1.9 MB
Downloading: 1.9 MB
Downloading: 1.9 MB
Downloading: 1.9 MB
Downloading: 1.9 MB
Downloading: 1.9 MB
Downloading: 1.9 MB
Downloading: 2 MB
Downloading: 2 MB
Downloading: 2 MB
Downloading: 2 MB
Downloading: 2 MB
Downloading: 2 MB
Downloading: 2 MB
Downloading: 2 MB
Downloading: 2 MB
Downloading: 2 MB
Downloading: 2 MB
Downloading: 2 MB
Downloading: 2.1 MB
Downloading: 2.1 MB
Downloading: 2.1 MB
Downloading: 2.1 MB
Downloading: 2.1 MB
Downloading: 2.1 MB
Downloading: 2.1 MB
Downloading: 2.1 MB
Downloading: 2.1 MB
Downloading: 2.1 MB
Downloading: 2.1 MB
Downloading: 2.1 MB
Downloading: 2.1 MB
Downloading: 2.1 MB
Downloading: 2.1 MB
Downloading: 2.1 MB
Downloading: 2.1 MB
Downloading: 2.1 MB
Downloading: 2.1 MB
Downloading: 2.1 MB
Downloading: 2.1 MB
Downloading: 2.1 MB
Downloading: 2.2 MB
Downloading: 2.2 MB
Downloading: 2.2 MB
Downloading: 2.2 MB
Downloading: 2.2 MB
Downloading: 2.2 MB
Downloading: 2.2 MB
Downloading: 2.2 MB
Downloading: 2.2 MB
Downloading: 2.2 MB
Downloading: 2.2 MB
Downloading: 2.2 MB
Downloading: 2.2 MB
Downloading: 2.2 MB
Downloading: 2.2 MB
Downloading: 2.2 MB
Downloading: 2.3 MB
Downloading: 2.3 MB
Downloading: 2.3 MB
Downloading: 2.3 MB
Downloading: 2.3 MB
Downloading: 2.3 MB
Downloading: 2.3 MB
Downloading: 2.3 MB
Downloading: 2.3 MB
Downloading: 2.3 MB
Downloading: 2.3 MB
Downloading: 2.3 MB
Downloading: 2.4 MB
Downloading: 2.4 MB
Downloading: 2.4 MB
Downloading: 2.4 MB
Downloading: 2.4 MB
Downloading: 2.4 MB
Downloading: 2.4 MB
Downloading: 2.4 MB
Downloading: 2.4 MB
Downloading: 2.4 MB
Downloading: 2.4 MB
Downloading: 2.4 MB
Downloading: 2.4 MB
Downloading: 2.4 MB
Downloading: 2.4 MB
Downloading: 2.4 MB
Downloading: 2.4 MB
Downloading: 2.4 MB
Downloading: 2.4 MB
Downloading: 2.4 MB
Downloading: 2.4 MB
Downloading: 2.4 MB
Downloading: 2.5 MB
Downloading: 2.5 MB
Downloading: 2.5 MB
Downloading: 2.5 MB
Downloading: 2.5 MB
Downloading: 2.5 MB
Downloading: 2.5 MB
Downloading: 2.5 MB
Downloading: 2.5 MB
Downloading: 2.5 MB
Downloading: 2.5 MB
Downloading: 2.5 MB
Downloading: 2.5 MB
Downloading: 2.5 MB
Downloading: 2.5 MB
Downloading: 2.5 MB
Downloading: 2.5 MB
Downloading: 2.5 MB
Downloading: 2.5 MB
Downloading: 2.5 MB
Downloading: 2.5 MB
Downloading: 2.5 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.6 MB
Downloading: 2.7 MB
Downloading: 2.7 MB
Downloading: 2.7 MB
Downloading: 2.7 MB
Downloading: 2.7 MB
Downloading: 2.7 MB
Downloading: 2.7 MB
Downloading: 2.7 MB
Downloading: 2.7 MB
Downloading: 2.7 MB
Downloading: 2.7 MB
Downloading: 2.7 MB
Downloading: 2.7 MB
Downloading: 2.7 MB
Downloading: 2.7 MB
Downloading: 2.7 MB
Downloading: 2.7 MB
Downloading: 2.7 MB
Downloading: 2.7 MB
Downloading: 2.7 MB
Downloading: 2.7 MB
Downloading: 2.7 MB
Downloading: 2.7 MB
Downloading: 2.7 MB
Downloading: 2.8 MB
Downloading: 2.8 MB
Downloading: 2.8 MB
Downloading: 2.8 MB
Downloading: 2.8 MB
Downloading: 2.8 MB
## Rows: 1,331
## Columns: 10
## $ code_muni <dbl> 2700300, 2701506, 2702009, 2702355, 2702603, 27029...
## $ name_muni <chr> "Arapiraca", "Campo Grande", "Coité Do Nóia", "Cra...
## $ code_state <chr> "27", "27", "27", "27", "27", "27", "27", "27", "2...
## $ abbrev_state <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "A...
## $ name_metro <chr> "RM Agreste", "RM Agreste", "RM Agreste", "RM Agre...
## $ type <chr> "RM", "RM", "RM", "RM", "RM", "RM", "RM", "RM", "R...
## $ subdivision <chr> "NÃO TEM", "NÃO TEM", "NÃO TEM", "NÃO TEM", "NÃO T...
## $ legislation <chr> "Lei Complementar 27", "Lei Complementar 27", "Lei...
## $ legislation_date <chr> "30.11.2009", "30.11.2009", "30.11.2009", "30.11.2...
## $ geom <MULTIPOLYGON [°]> MULTIPOLYGON (((-36.66005 -..., MULTI...
Observation: Based on the downloaded data, we can see that there are 1331 observations in this metropolitan boundary file. But not all of those are the interest of the study. Therefore, it is required to filter out unnecessary data and only take in the data that are needed to carry out the analysis of the study.
Based on the observations above, it is necessary to filter out the data to meet the requirement of the study which is only to observe the list of municipals listed under São Paulo Macrometropolis to compute location quotients by industry type, at municipality level. ### 3.3.1. Extracting Municipals of Braganca Paulista city This code chunk will perform the data wrangling to get the list of 10 municipals in Regional Unit of Bragança Paulista city and add a new column name name_metro to the newly prepared muni_state simple feature frame.
muni_state <- municipal %>%
filter(abbrev_state =="SP" & name_muni %in% c('Atibaia', 'Bom Jesus Dos Perdões', 'Bragança Paulista', 'Joanópolis', 'Nazaré Paulista', 'Pedra Bela', 'Pinhalzinho', 'Piracaia', 'Tuiuti', 'Vargem')) %>%
mutate(name_metro = 'Regional Unit of Bragança Paulista city')
muni_state## Simple feature collection with 10 features and 5 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -46.76118 ymin: -23.31611 xmax: -46.04105 ymax: -22.70131
## geographic CRS: SIRGAS 2000
## code_muni name_muni code_state abbrev_state
## 1 3504107 Atibaia 35 SP
## 2 3507100 Bom Jesus Dos Perdões 35 SP
## 3 3507605 Bragança Paulista 35 SP
## 4 3525508 Joanópolis 35 SP
## 5 3532405 Nazaré Paulista 35 SP
## 6 3536802 Pedra Bela 35 SP
## 7 3538204 Pinhalzinho 35 SP
## 8 3538600 Piracaia 35 SP
## 9 3554953 Tuiuti 35 SP
## 10 3556354 Vargem 35 SP
## geom name_metro
## 1 MULTIPOLYGON (((-46.66146 -... Regional Unit of Bragança Paulista city
## 2 MULTIPOLYGON (((-46.47303 -... Regional Unit of Bragança Paulista city
## 3 MULTIPOLYGON (((-46.50858 -... Regional Unit of Bragança Paulista city
## 4 MULTIPOLYGON (((-46.14383 -... Regional Unit of Bragança Paulista city
## 5 MULTIPOLYGON (((-46.26139 -... Regional Unit of Bragança Paulista city
## 6 MULTIPOLYGON (((-46.46284 -... Regional Unit of Bragança Paulista city
## 7 MULTIPOLYGON (((-46.53711 -... Regional Unit of Bragança Paulista city
## 8 MULTIPOLYGON (((-46.34589 -... Regional Unit of Bragança Paulista city
## 9 MULTIPOLYGON (((-46.6655 -2... Regional Unit of Bragança Paulista city
## 10 MULTIPOLYGON (((-46.39475 -... Regional Unit of Bragança Paulista city
Observation: By this data wrangling, muni_state simple feature frame now only contains the 10 municipals of SP state and 5 column names ’code muni, name_muni, code_state, abbrev_state, geom, and name metro.
This code chunk will perform data wrangling to keep only the code_muni, name_muni, code_state, abbrev_state, name_metro and geom as the study area in the metro_state simple feature frame.
metro_state <- metropolian %>%
filter(abbrev_state =="SP") %>%
select(code_muni, name_muni, code_state, abbrev_state, name_metro,geom)
metro_state## Simple feature collection with 164 features and 5 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -48.40857 ymin: -24.492 xmax: -44.16137 ymax: -22.01305
## geographic CRS: SIRGAS 2000
## First 10 features:
## code_muni name_muni code_state abbrev_state
## 1 3508405 Cabreúva 35 SP
## 2 3509601 Campo Limpo Paulista 35 SP
## 3 3524006 Itupeva 35 SP
## 4 3525201 Jarinu 35 SP
## 5 3525904 Jundiaí 35 SP
## 6 3527306 Louveira 35 SP
## 7 3556503 Várzea Paulista 35 SP
## 8 3500600 Águas De São Pedro 35 SP
## 9 3502002 Analândia 35 SP
## 10 3503307 Araras 35 SP
## name_metro
## 1 Aglomeração Urbana de Jundiaí
## 2 Aglomeração Urbana de Jundiaí
## 3 Aglomeração Urbana de Jundiaí
## 4 Aglomeração Urbana de Jundiaí
## 5 Aglomeração Urbana de Jundiaí
## 6 Aglomeração Urbana de Jundiaí
## 7 Aglomeração Urbana de Jundiaí
## 8 Aglomeração Urbana de Piracicaba-AU- Piracicaba
## 9 Aglomeração Urbana de Piracicaba-AU- Piracicaba
## 10 Aglomeração Urbana de Piracicaba-AU- Piracicaba
## geom
## 1 MULTIPOLYGON (((-47.09692 -...
## 2 MULTIPOLYGON (((-46.78614 -...
## 3 MULTIPOLYGON (((-47.04599 -...
## 4 MULTIPOLYGON (((-46.71471 -...
## 5 MULTIPOLYGON (((-46.85676 -...
## 6 MULTIPOLYGON (((-46.91599 -...
## 7 MULTIPOLYGON (((-46.82121 -...
## 8 MULTIPOLYGON (((-47.86297 -...
## 9 MULTIPOLYGON (((-47.67277 -...
## 10 MULTIPOLYGON (((-47.23694 -...
Observation: By this data wrangling, the created metro_state will now only contains the 164 metro that belong to “SP” state along with the 5 columns names; code_muni, name_muni, code_state, abbrev_state, name_metro and geom. It is also found that metro_state and muni_state simple feature frameshave the same number of column names. Therefore, it is possible to merge them to get the study area of the analysis.
This code chunk will merge the metro_state and muni_state together into one simple feature to form the study area of the assignment.
## Simple feature collection with 174 features and 5 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -48.40857 ymin: -24.492 xmax: -44.16137 ymax: -22.01305
## geographic CRS: SIRGAS 2000
## First 10 features:
## code_muni name_muni code_state abbrev_state
## 1 3508405 Cabreúva 35 SP
## 2 3509601 Campo Limpo Paulista 35 SP
## 3 3524006 Itupeva 35 SP
## 4 3525201 Jarinu 35 SP
## 5 3525904 Jundiaí 35 SP
## 6 3527306 Louveira 35 SP
## 7 3556503 Várzea Paulista 35 SP
## 8 3500600 Águas De São Pedro 35 SP
## 9 3502002 Analândia 35 SP
## 10 3503307 Araras 35 SP
## name_metro
## 1 Aglomeração Urbana de Jundiaí
## 2 Aglomeração Urbana de Jundiaí
## 3 Aglomeração Urbana de Jundiaí
## 4 Aglomeração Urbana de Jundiaí
## 5 Aglomeração Urbana de Jundiaí
## 6 Aglomeração Urbana de Jundiaí
## 7 Aglomeração Urbana de Jundiaí
## 8 Aglomeração Urbana de Piracicaba-AU- Piracicaba
## 9 Aglomeração Urbana de Piracicaba-AU- Piracicaba
## 10 Aglomeração Urbana de Piracicaba-AU- Piracicaba
## geom
## 1 MULTIPOLYGON (((-47.09692 -...
## 2 MULTIPOLYGON (((-46.78614 -...
## 3 MULTIPOLYGON (((-47.04599 -...
## 4 MULTIPOLYGON (((-46.71471 -...
## 5 MULTIPOLYGON (((-46.85676 -...
## 6 MULTIPOLYGON (((-46.91599 -...
## 7 MULTIPOLYGON (((-46.82121 -...
## 8 MULTIPOLYGON (((-47.86297 -...
## 9 MULTIPOLYGON (((-47.67277 -...
## 10 MULTIPOLYGON (((-47.23694 -...
Observation: After merging the data, the data from the 164 municipals from sao_paulo_metro_boundary and 10 municipals from braganca_paulista city are combined together to form the study of the area of the assignment. It contains a total of 174 municipals with 5 column names to describe them.
This code chunk will filter by the brazil cities that belong to Sao Paulo state only and its industry data.
brazil_cities_industry <- brazil_data %>%
filter(STATE == "SP")%>%
select (CITY,contains(c("COMP")))
brazil_cities_industry## # A tibble: 645 x 23
## CITY COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G COMP_H COMP_I
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Adam~ 1726 272 0 102 0 4 35 663 102 99
## 2 Adol~ 81 10 2 3 0 0 2 33 5 4
## 3 Aguaí 799 149 9 54 0 3 21 333 44 51
## 4 Água~ 212 48 0 18 0 0 9 63 7 21
## 5 Água~ 881 11 0 168 0 1 41 328 11 92
## 6 Água~ 216 60 1 10 0 0 7 68 15 19
## 7 Água~ 202 2 1 4 0 1 6 99 3 40
## 8 Agud~ 840 46 0 91 0 3 38 365 50 64
## 9 Alam~ 137 61 0 14 0 0 4 31 3 8
## 10 Alfr~ 107 31 0 9 0 0 3 37 3 3
## # ... with 635 more rows, and 12 more variables: COMP_J <dbl>, COMP_K <dbl>,
## # COMP_L <dbl>, COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>,
## # COMP_Q <dbl>, COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>
Observation: By filtering by the brazil cities that belong to Sao Paulo state only, the data is reduced from original 5573 observations to 645 observation. The number of columns are also further reduced by only ttaking in the name of city, and the columns that contains the name ‘comp’ which are required in carrying out the analysis of the study area.
This code chunk will use the left_join() of the dplyr to combine operation between muni metro study_area and brazil_cities_industry data frames to form the industry study area.
industry_study_area <- left_join(study_area, brazil_cities_industry, by = c("name_muni"= "CITY"))
industry_study_area## Simple feature collection with 174 features and 27 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -48.40857 ymin: -24.492 xmax: -44.16137 ymax: -22.01305
## geographic CRS: SIRGAS 2000
## First 10 features:
## code_muni name_muni code_state abbrev_state
## 1 3508405 Cabreúva 35 SP
## 2 3509601 Campo Limpo Paulista 35 SP
## 3 3524006 Itupeva 35 SP
## 4 3525201 Jarinu 35 SP
## 5 3525904 Jundiaí 35 SP
## 6 3527306 Louveira 35 SP
## 7 3556503 Várzea Paulista 35 SP
## 8 3500600 Águas De São Pedro 35 SP
## 9 3502002 Analândia 35 SP
## 10 3503307 Araras 35 SP
## name_metro COMP_TOT COMP_A COMP_B
## 1 Aglomeração Urbana de Jundiaí 1062 23 2
## 2 Aglomeração Urbana de Jundiaí 1198 16 0
## 3 Aglomeração Urbana de Jundiaí 1773 39 3
## 4 Aglomeração Urbana de Jundiaí 802 29 2
## 5 Aglomeração Urbana de Jundiaí 16395 198 11
## 6 Aglomeração Urbana de Jundiaí 1230 34 0
## 7 Aglomeração Urbana de Jundiaí 1975 8 0
## 8 Aglomeração Urbana de Piracicaba-AU- Piracicaba 202 2 1
## 9 Aglomeração Urbana de Piracicaba-AU- Piracicaba 162 36 3
## 10 Aglomeração Urbana de Piracicaba-AU- Piracicaba 4662 343 5
## COMP_C COMP_D COMP_E COMP_F COMP_G COMP_H COMP_I COMP_J COMP_K COMP_L COMP_M
## 1 180 0 6 54 375 109 69 12 24 17 30
## 2 151 0 2 76 459 52 81 38 9 18 56
## 3 274 1 9 95 584 101 134 33 25 64 79
## 4 73 0 2 41 298 52 71 27 5 8 35
## 5 1030 2 42 742 5431 833 1152 648 310 458 1292
## 6 145 0 6 70 492 106 102 21 8 24 39
## 7 388 0 13 134 782 137 105 32 9 16 54
## 8 4 0 1 6 99 3 40 5 2 1 9
## 9 8 0 0 6 39 26 20 3 0 4 6
## 10 481 1 15 233 1816 269 300 86 61 69 180
## COMP_N COMP_O COMP_P COMP_Q COMP_R COMP_S COMP_T COMP_U
## 1 62 2 18 35 4 40 0 0
## 2 73 2 84 31 10 40 0 0
## 3 150 4 61 21 22 74 0 0
## 4 76 2 37 15 11 18 0 0
## 5 1904 9 586 802 230 715 0 0
## 6 65 4 39 19 13 43 0 0
## 7 128 2 66 27 12 62 0 0
## 8 15 2 3 3 2 4 0 0
## 9 4 2 0 2 0 3 0 0
## 10 314 4 112 173 44 156 0 0
## geom
## 1 MULTIPOLYGON (((-47.09692 -...
## 2 MULTIPOLYGON (((-46.78614 -...
## 3 MULTIPOLYGON (((-47.04599 -...
## 4 MULTIPOLYGON (((-46.71471 -...
## 5 MULTIPOLYGON (((-46.85676 -...
## 6 MULTIPOLYGON (((-46.91599 -...
## 7 MULTIPOLYGON (((-46.82121 -...
## 8 MULTIPOLYGON (((-47.86297 -...
## 9 MULTIPOLYGON (((-47.67277 -...
## 10 MULTIPOLYGON (((-47.23694 -...
Observation: By merging the data between study_area and brazil_cities_industry data frames, the industry_study_area of this assignment is formed. This simple feature frame contains 174 features with 27 fields which will all be important in performing the analysis of the study to find out the distribution of spatial specialization by industry type, 2016 at municipality level.
This code chunk will compute the location quotient of each industry from Comp A to U
location_quotient <- industry_study_area %>%
mutate_all(~replace(., is.na(.), 0))%>%
mutate_at(vars(contains("COMP")), funs(LQ=(./COMP_TOT)/(sum(.)/sum(COMP_TOT))))%>%
select(name_muni,name_metro, contains(c("COMP")),contains(c("LQ")))## Warning: `funs()` is deprecated as of dplyr 0.8.0.
## Please use a list of either functions or lambdas:
##
## # Simple named list:
## list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`:
## tibble::lst(mean, median)
##
## # Using lambdas
## list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## Simple feature collection with 174 features and 46 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -48.40857 ymin: -24.492 xmax: -44.16137 ymax: -22.01305
## geographic CRS: SIRGAS 2000
## First 10 features:
## name_muni name_metro
## 1 Cabreúva Aglomeração Urbana de Jundiaí
## 2 Campo Limpo Paulista Aglomeração Urbana de Jundiaí
## 3 Itupeva Aglomeração Urbana de Jundiaí
## 4 Jarinu Aglomeração Urbana de Jundiaí
## 5 Jundiaí Aglomeração Urbana de Jundiaí
## 6 Louveira Aglomeração Urbana de Jundiaí
## 7 Várzea Paulista Aglomeração Urbana de Jundiaí
## 8 Águas De São Pedro Aglomeração Urbana de Piracicaba-AU- Piracicaba
## 9 Analândia Aglomeração Urbana de Piracicaba-AU- Piracicaba
## 10 Araras Aglomeração Urbana de Piracicaba-AU- Piracicaba
## COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G COMP_H COMP_I
## 1 1062 23 2 180 0 6 54 375 109 69
## 2 1198 16 0 151 0 2 76 459 52 81
## 3 1773 39 3 274 1 9 95 584 101 134
## 4 802 29 2 73 0 2 41 298 52 71
## 5 16395 198 11 1030 2 42 742 5431 833 1152
## 6 1230 34 0 145 0 6 70 492 106 102
## 7 1975 8 0 388 0 13 134 782 137 105
## 8 202 2 1 4 0 1 6 99 3 40
## 9 162 36 3 8 0 0 6 39 26 20
## 10 4662 343 5 481 1 15 233 1816 269 300
## COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R COMP_S COMP_T
## 1 12 24 17 30 62 2 18 35 4 40 0
## 2 38 9 18 56 73 2 84 31 10 40 0
## 3 33 25 64 79 150 4 61 21 22 74 0
## 4 27 5 8 35 76 2 37 15 11 18 0
## 5 648 310 458 1292 1904 9 586 802 230 715 0
## 6 21 8 24 39 65 4 39 19 13 43 0
## 7 32 9 16 54 128 2 66 27 12 62 0
## 8 5 2 1 9 15 2 3 3 2 4 0
## 9 3 0 4 6 4 2 0 2 0 3 0
## 10 86 61 69 180 314 4 112 173 44 156 0
## COMP_U COMP_TOT_LQ COMP_A_LQ COMP_B_LQ COMP_C_LQ COMP_D_LQ COMP_E_LQ
## 1 0 1 1.6778605 3.651562 2.4607702 0.0000000 2.8991794
## 2 0 1 1.0347030 0.000000 1.8299668 0.0000000 0.8566857
## 3 0 1 1.7041523 3.280845 2.2437006 1.5842341 2.6048465
## 4 0 1 2.8014067 4.835360 1.3215134 0.0000000 1.2796877
## 5 0 1 0.9356347 1.300931 0.9121135 0.3426468 1.3145776
## 6 0 1 2.1415407 0.000000 1.7115357 0.0000000 2.5031940
## 7 0 1 0.3138162 0.000000 2.8522508 0.0000000 3.3777275
## 8 0 1 0.7670632 9.598907 0.2874959 0.0000000 2.5403701
## 9 0 1 17.2163080 35.907023 0.7169652 0.0000000 0.0000000
## 10 0 1 5.6999939 2.079557 1.4979451 0.6024983 1.6510771
## COMP_F_LQ COMP_G_LQ COMP_H_LQ COMP_I_LQ COMP_J_LQ COMP_K_LQ COMP_L_LQ
## 1 1.0260217 1.0891761 2.3423920 1.0040926 0.2172296 0.7488611 0.7166303
## 2 1.2801005 1.1818089 0.9906132 1.0449064 0.6098022 0.2489432 0.6726458
## 3 1.0811904 1.0160042 1.3000805 1.1680065 0.3578223 0.4672463 1.6160023
## 4 1.0315655 1.1461284 1.4797439 1.3681483 0.6472196 0.2065904 0.4465667
## 5 0.9132292 1.0217855 1.1595545 1.0859010 0.7598464 0.6265631 1.2506184
## 6 1.1483657 1.2338187 1.9667916 1.2815757 0.3282286 0.2155259 0.8735281
## 7 1.3690679 1.2213244 1.5831099 0.8216207 0.3114907 0.1510045 0.3626800
## 8 0.5993592 1.5117334 0.3389436 3.0602564 0.4758618 0.3280901 0.2216253
## 9 0.7473491 0.7425761 3.6628230 1.9079376 0.3560152 0.0000000 1.1053905
## 10 1.0084885 1.2015309 1.3168539 0.9944849 0.3546406 0.4335829 0.6625941
## COMP_M_LQ COMP_N_LQ COMP_O_LQ COMP_P_LQ COMP_Q_LQ COMP_R_LQ COMP_S_LQ
## 1 0.3859211 0.4593034 2.9852937 0.5309395 0.8911083 0.3054418 0.8125985
## 2 0.6386060 0.4794005 2.6463956 2.1964410 0.6996677 0.6769181 0.7203502
## 3 0.6087236 0.6656025 3.5762909 1.0777504 0.3202562 1.0062523 0.9004581
## 4 0.5962048 0.7455412 3.9530946 1.4451883 0.5057127 1.1122727 0.4842155
## 5 1.0765964 0.9136671 0.8701872 1.1196525 1.3226651 1.1376518 0.9408820
## 6 0.4331729 0.4157581 5.1550925 0.9932454 0.4176721 0.8570994 0.7542301
## 7 0.3735326 0.5098886 1.6052567 1.0468246 0.3696440 0.4927278 0.6772751
## 8 0.6086855 0.5842145 15.6949599 0.4652292 0.4015659 0.8029187 0.4272176
## 9 0.5059855 0.1942573 19.5702586 0.0000000 0.3338120 0.0000000 0.3995276
## 10 0.5274752 0.5298950 1.3600952 0.7525629 1.0033692 0.7653734 0.7219263
## COMP_T_LQ COMP_U_LQ geom
## 1 NaN 0 MULTIPOLYGON (((-47.09692 -...
## 2 NaN 0 MULTIPOLYGON (((-46.78614 -...
## 3 NaN 0 MULTIPOLYGON (((-47.04599 -...
## 4 NaN 0 MULTIPOLYGON (((-46.71471 -...
## 5 NaN 0 MULTIPOLYGON (((-46.85676 -...
## 6 NaN 0 MULTIPOLYGON (((-46.91599 -...
## 7 NaN 0 MULTIPOLYGON (((-46.82121 -...
## 8 NaN 0 MULTIPOLYGON (((-47.86297 -...
## 9 NaN 0 MULTIPOLYGON (((-47.67277 -...
## 10 NaN 0 MULTIPOLYGON (((-47.23694 -...
Observation: From the location quotient simple feature dataframe, it can be seen that newly computed location quotient for comp A to U are added into the dataframe and now it contains 46 fields
This code chunk will convert the muni_state to Spatial Polygon
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded datum Unknown based on GRS80 ellipsoid in CRS definition,
## but +towgs84= values preserved
## class : SpatialPolygonsDataFrame
## features : 10
## extent : -46.76118, -46.04105, -23.31611, -22.70131 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
## variables : 5
## names : code_muni, name_muni, code_state, abbrev_state, name_metro
## min values : 3504107, Atibaia, 35, SP, Regional Unit of Bragança Paulista city
## max values : 3556354, Vargem, 35, SP, Regional Unit of Bragança Paulista city
Observation: The muni_state simple feature dataframe is being converted to spatial polygon data frame called muni_state_sp.
This code chunk will conver the location_quotient into spatial polygon
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded datum Unknown based on GRS80 ellipsoid in CRS definition,
## but +towgs84= values preserved
## class : SpatialPolygonsDataFrame
## features : 174
## extent : -48.40857, -44.16137, -24.492, -22.01305 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
## variables : 46
## names : name_muni, name_metro, COMP_TOT, COMP_A, COMP_B, COMP_C, COMP_D, COMP_E, COMP_F, COMP_G, COMP_H, COMP_I, COMP_J, COMP_K, COMP_L, ...
## min values : Águas De São Pedro, Aglomeração Urbana de Jundiaí, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## max values : Votorantim, RM São Paulo, 530446, 1125, 79, 31566, 332, 657, 25222, 150633, 19515, 29290, 38720, 23738, 14003, ...
Observation: The location_quotient simple feature dataframe is being converted to spatial polygon data frame called location_quotient_sp.
In this step, I will be creating choropleth maps to show the distribution of spatial specialisation by industry type, 2-16 at municipality level.
comp_a_map <- tm_shape(location_quotient_sp) +
tm_fill("COMP_A_LQ") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "Agriculture_A_LQ",
main.title.position = "center",
main.title.size = 0.8,
legend.show = TRUE,
legend.text.size = 0.6,
frame = TRUE) comp_b_map <- tm_shape(location_quotient_sp) +
tm_fill("COMP_B_LQ") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "Extractive_B_LQ",
main.title.position = "center",
main.title.size = 0.8,
legend.show = TRUE,
legend.text.size = 0.6,
frame = TRUE) comp_c_map <- tm_shape(location_quotient_sp) +
tm_fill("COMP_C_LQ") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "Transformation_C_LQ",
main.title.position = "center",
main.title.size = 0.8,
legend.show = TRUE,
legend.text.size = 0.6,
frame = TRUE) comp_d_map <- tm_shape(location_quotient_sp) +
tm_fill("COMP_D_LQ") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "Electricity_D_LQ",
main.title.position = "center",
main.title.size = 0.8,
legend.show = TRUE,
legend.text.size = 0.6,
frame = TRUE) comp_e_map <- tm_shape(location_quotient_sp) +
tm_fill("COMP_E_LQ") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "WasteManagment_E_LQ",
main.title.position = "center",
main.title.size = 0.8,
legend.show = TRUE,
legend.text.size = 0.6,
frame = TRUE) comp_f_map <- tm_shape(location_quotient_sp) +
tm_fill("COMP_F_LQ") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "Construction_F_LQ",
main.title.position = "center",
main.title.size = 0.8,
legend.show = TRUE,
legend.text.size = 0.6,
frame = TRUE) comp_g_map <- tm_shape(location_quotient_sp) +
tm_fill("COMP_G_LQ") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "Trade_G_LQ",
main.title.position = "center",
main.title.size = 0.8,
legend.show = TRUE,
legend.text.size = 0.6,
frame = TRUE) comp_h_map <- tm_shape(location_quotient_sp) +
tm_fill("COMP_H_LQ") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "Transport_H_LQ",
main.title.position = "center",
main.title.size = 0.8,
legend.show = TRUE,
legend.text.size = 0.6,
frame = TRUE) comp_i_map <- tm_shape(location_quotient_sp) +
tm_fill("COMP_I_LQ") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "Accommodation_I_LQ",
main.title.position = "center",
main.title.size = 0.8,
legend.show = TRUE,
legend.text.size = 0.6,
frame = TRUE) comp_j_map <- tm_shape(location_quotient_sp) +
tm_fill("COMP_J_LQ") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "ICT_J_LQ",
main.title.position = "center",
main.title.size = 0.8,
legend.show = TRUE,
legend.text.size = 0.6,
frame = TRUE) comp_k_map <- tm_shape(location_quotient_sp) +
tm_fill("COMP_K_LQ") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "Financial_K_LQ",
main.title.position = "center",
main.title.size = 0.8,
legend.show = TRUE,
legend.text.size = 0.6,
frame = TRUE) comp_l_map <- tm_shape(location_quotient_sp) +
tm_fill("COMP_L_LQ") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "RealEstate_L_LQ",
main.title.position = "center",
main.title.size = 0.8,
legend.show = TRUE,
legend.text.size = 0.6,
frame = TRUE) comp_m_map <- tm_shape(location_quotient_sp) +
tm_fill("COMP_M_LQ") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "Scientific_M_LQ",
main.title.position = "center",
main.title.size = 0.8,
legend.show = TRUE,
legend.text.size = 0.6,
frame = TRUE) comp_n_map <- tm_shape(location_quotient_sp) +
tm_fill("COMP_N_LQ") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "Administrative_N_LQ",
main.title.position = "center",
main.title.size = 0.8,
legend.show = TRUE,
legend.text.size = 0.6,
frame = TRUE) comp_o_map <- tm_shape(location_quotient_sp) +
tm_fill("COMP_O_LQ") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "PublicAdm_O_LQ",
main.title.position = "center",
main.title.size = 0.8,
legend.show = TRUE,
legend.text.size = 0.6,
frame = TRUE) comp_p_map <- tm_shape(location_quotient_sp) +
tm_fill("COMP_P_LQ") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "Education_P_LQ",
main.title.position = "center",
main.title.size = 0.8,
legend.show = TRUE,
legend.text.size = 0.6,
frame = TRUE) comp_q_map <- tm_shape(location_quotient_sp) +
tm_fill("COMP_Q_LQ") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "Health_Q_LQ",
main.title.position = "center",
main.title.size = 0.8,
legend.show = TRUE,
legend.text.size = 0.6,
frame = TRUE) comp_r_map <- tm_shape(location_quotient_sp) +
tm_fill("COMP_R_LQ") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "Arts&Culture_R_LQ",
main.title.position = "center",
main.title.size = 0.8,
legend.show = TRUE,
legend.text.size = 0.6,
frame = TRUE) comp_s_map <- tm_shape(location_quotient_sp) +
tm_fill("COMP_S_LQ") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "Service_S_LQ",
main.title.position = "center",
main.title.size = 0.8,
legend.show = TRUE,
legend.text.size = 0.6,
frame = TRUE) comp_t_map <- tm_shape(location_quotient_sp) +
tm_fill("COMP_T_LQ") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "Domestic_T_LQ",
main.title.position = "center",
main.title.size = 0.8,
legend.show = TRUE,
legend.text.size = 0.6,
frame = TRUE) comp_u_map <- tm_shape(location_quotient_sp) +
tm_fill("COMP_U_LQ") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "International_U_LQ",
main.title.position = "center",
main.title.size = 0.8,
legend.show = TRUE,
legend.text.size = 0.6,
frame = TRUE) This code chunk will show the distribution of spatial specialization by industry type at 2016 municipal level.
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
Geocommunication: Based on the choropleth maps that created to show the distribution of spatial specialization by industry type at municipal, we can see that Construction_F, Trade_G, RealEstate_L, Education_P, Health_Q, Arts_R, Service_S are widely spread out across Greater Sao Paulo State. Moreover, it can be seen that Agriculture_A industry is highly specialized in a few municipalities of the Sao Paulo State and those municipalities that highly specialized in Agriculture_A industry is clustered around together. However, there is no Domestic_T industry in the Sao Paulo State. Therefore, this industry T will be removed from the study area for the subsequent analysis.
Before we perform cluster analysis, it is important for us to ensure that the cluster variables are not highly correlated.Therefore, correlation analysis needs to be carried out. This code chunk will use coorplot.mixed() function of corrplot package to visualise and analyse the correlation of the input variables.
lq_cluster <- as.data.frame(location_quotient)%>%
mutate_all(~replace(., is.na(.), 0)) %>%
select(name_muni, contains("LQ"), -"COMP_T_LQ", -"COMP_TOT_LQ")
cluster_vars.cor = cor(lq_cluster[2:21])
corrplot.mixed(cluster_vars.cor,
lower = "ellipse",
upper = "ellipse",
tl.pos = "lt",
diag = "l",
tl.col = "black") Observation: Based on the correlation plot, we can see that Financial_K & Professional_M are highly correlated. This suggest only one of industry should be used in the cluster analysis instead of both.
The code chunk below will be used to extract the clustering variables lq_cluster simple feature object into data.frame.
lq_cluster_wo_m <- as.data.frame(lq_cluster)%>%
mutate_all(~replace(., is.na(.), 0)) %>%
select(-"COMP_M_LQ")
head(lq_cluster_wo_m,10)## name_muni COMP_A_LQ COMP_B_LQ COMP_C_LQ COMP_D_LQ COMP_E_LQ
## 1 Cabreúva 1.6778605 3.651562 2.4607702 0.0000000 2.8991794
## 2 Campo Limpo Paulista 1.0347030 0.000000 1.8299668 0.0000000 0.8566857
## 3 Itupeva 1.7041523 3.280845 2.2437006 1.5842341 2.6048465
## 4 Jarinu 2.8014067 4.835360 1.3215134 0.0000000 1.2796877
## 5 Jundiaí 0.9356347 1.300931 0.9121135 0.3426468 1.3145776
## 6 Louveira 2.1415407 0.000000 1.7115357 0.0000000 2.5031940
## 7 Várzea Paulista 0.3138162 0.000000 2.8522508 0.0000000 3.3777275
## 8 Águas De São Pedro 0.7670632 9.598907 0.2874959 0.0000000 2.5403701
## 9 Analândia 17.2163080 35.907023 0.7169652 0.0000000 0.0000000
## 10 Araras 5.6999939 2.079557 1.4979451 0.6024983 1.6510771
## COMP_F_LQ COMP_G_LQ COMP_H_LQ COMP_I_LQ COMP_J_LQ COMP_K_LQ COMP_L_LQ
## 1 1.0260217 1.0891761 2.3423920 1.0040926 0.2172296 0.7488611 0.7166303
## 2 1.2801005 1.1818089 0.9906132 1.0449064 0.6098022 0.2489432 0.6726458
## 3 1.0811904 1.0160042 1.3000805 1.1680065 0.3578223 0.4672463 1.6160023
## 4 1.0315655 1.1461284 1.4797439 1.3681483 0.6472196 0.2065904 0.4465667
## 5 0.9132292 1.0217855 1.1595545 1.0859010 0.7598464 0.6265631 1.2506184
## 6 1.1483657 1.2338187 1.9667916 1.2815757 0.3282286 0.2155259 0.8735281
## 7 1.3690679 1.2213244 1.5831099 0.8216207 0.3114907 0.1510045 0.3626800
## 8 0.5993592 1.5117334 0.3389436 3.0602564 0.4758618 0.3280901 0.2216253
## 9 0.7473491 0.7425761 3.6628230 1.9079376 0.3560152 0.0000000 1.1053905
## 10 1.0084885 1.2015309 1.3168539 0.9944849 0.3546406 0.4335829 0.6625941
## COMP_N_LQ COMP_O_LQ COMP_P_LQ COMP_Q_LQ COMP_R_LQ COMP_S_LQ COMP_U_LQ
## 1 0.4593034 2.9852937 0.5309395 0.8911083 0.3054418 0.8125985 0
## 2 0.4794005 2.6463956 2.1964410 0.6996677 0.6769181 0.7203502 0
## 3 0.6656025 3.5762909 1.0777504 0.3202562 1.0062523 0.9004581 0
## 4 0.7455412 3.9530946 1.4451883 0.5057127 1.1122727 0.4842155 0
## 5 0.9136671 0.8701872 1.1196525 1.3226651 1.1376518 0.9408820 0
## 6 0.4157581 5.1550925 0.9932454 0.4176721 0.8570994 0.7542301 0
## 7 0.5098886 1.6052567 1.0468246 0.3696440 0.4927278 0.6772751 0
## 8 0.5842145 15.6949599 0.4652292 0.4015659 0.8029187 0.4272176 0
## 9 0.1942573 19.5702586 0.0000000 0.3338120 0.0000000 0.3995276 0
## 10 0.5298950 1.3600952 0.7525629 1.0033692 0.7653734 0.7219263 0
Observation: By looking at the lq_cluster_wo_m data frame, we can notice that the final clustering variables list does not include variable COMP_M_LQ because it is highly correlated with variable COMP_K_LQ
Next, we need to change the rows by municipality name instead of row number so that we can remove the name_muni column from the data.
This code chunk below will replace the rows with municipal name instead of row number.
## name_muni COMP_A_LQ COMP_B_LQ COMP_C_LQ
## Cabreúva Cabreúva 1.6778605 3.651562 2.4607702
## Campo Limpo Paulista Campo Limpo Paulista 1.0347030 0.000000 1.8299668
## Itupeva Itupeva 1.7041523 3.280845 2.2437006
## Jarinu Jarinu 2.8014067 4.835360 1.3215134
## Jundiaí Jundiaí 0.9356347 1.300931 0.9121135
## Louveira Louveira 2.1415407 0.000000 1.7115357
## Várzea Paulista Várzea Paulista 0.3138162 0.000000 2.8522508
## Águas De São Pedro Águas De São Pedro 0.7670632 9.598907 0.2874959
## Analândia Analândia 17.2163080 35.907023 0.7169652
## Araras Araras 5.6999939 2.079557 1.4979451
## COMP_D_LQ COMP_E_LQ COMP_F_LQ COMP_G_LQ COMP_H_LQ
## Cabreúva 0.0000000 2.8991794 1.0260217 1.0891761 2.3423920
## Campo Limpo Paulista 0.0000000 0.8566857 1.2801005 1.1818089 0.9906132
## Itupeva 1.5842341 2.6048465 1.0811904 1.0160042 1.3000805
## Jarinu 0.0000000 1.2796877 1.0315655 1.1461284 1.4797439
## Jundiaí 0.3426468 1.3145776 0.9132292 1.0217855 1.1595545
## Louveira 0.0000000 2.5031940 1.1483657 1.2338187 1.9667916
## Várzea Paulista 0.0000000 3.3777275 1.3690679 1.2213244 1.5831099
## Águas De São Pedro 0.0000000 2.5403701 0.5993592 1.5117334 0.3389436
## Analândia 0.0000000 0.0000000 0.7473491 0.7425761 3.6628230
## Araras 0.6024983 1.6510771 1.0084885 1.2015309 1.3168539
## COMP_I_LQ COMP_J_LQ COMP_K_LQ COMP_L_LQ COMP_N_LQ
## Cabreúva 1.0040926 0.2172296 0.7488611 0.7166303 0.4593034
## Campo Limpo Paulista 1.0449064 0.6098022 0.2489432 0.6726458 0.4794005
## Itupeva 1.1680065 0.3578223 0.4672463 1.6160023 0.6656025
## Jarinu 1.3681483 0.6472196 0.2065904 0.4465667 0.7455412
## Jundiaí 1.0859010 0.7598464 0.6265631 1.2506184 0.9136671
## Louveira 1.2815757 0.3282286 0.2155259 0.8735281 0.4157581
## Várzea Paulista 0.8216207 0.3114907 0.1510045 0.3626800 0.5098886
## Águas De São Pedro 3.0602564 0.4758618 0.3280901 0.2216253 0.5842145
## Analândia 1.9079376 0.3560152 0.0000000 1.1053905 0.1942573
## Araras 0.9944849 0.3546406 0.4335829 0.6625941 0.5298950
## COMP_O_LQ COMP_P_LQ COMP_Q_LQ COMP_R_LQ COMP_S_LQ
## Cabreúva 2.9852937 0.5309395 0.8911083 0.3054418 0.8125985
## Campo Limpo Paulista 2.6463956 2.1964410 0.6996677 0.6769181 0.7203502
## Itupeva 3.5762909 1.0777504 0.3202562 1.0062523 0.9004581
## Jarinu 3.9530946 1.4451883 0.5057127 1.1122727 0.4842155
## Jundiaí 0.8701872 1.1196525 1.3226651 1.1376518 0.9408820
## Louveira 5.1550925 0.9932454 0.4176721 0.8570994 0.7542301
## Várzea Paulista 1.6052567 1.0468246 0.3696440 0.4927278 0.6772751
## Águas De São Pedro 15.6949599 0.4652292 0.4015659 0.8029187 0.4272176
## Analândia 19.5702586 0.0000000 0.3338120 0.0000000 0.3995276
## Araras 1.3600952 0.7525629 1.0033692 0.7653734 0.7219263
## COMP_U_LQ
## Cabreúva 0
## Campo Limpo Paulista 0
## Itupeva 0
## Jarinu 0
## Jundiaí 0
## Louveira 0
## Várzea Paulista 0
## Águas De São Pedro 0
## Analândia 0
## Araras 0
Observation: Based on the content of the lq_cluster_wo_m, it can be seen that the row has been replaced with the name_numi.
Now, this code chunk will delete the name_muni field by using the code chunk below.
## COMP_A_LQ COMP_B_LQ COMP_C_LQ COMP_D_LQ COMP_E_LQ
## Cabreúva 1.6778605 3.651562 2.4607702 0.0000000 2.8991794
## Campo Limpo Paulista 1.0347030 0.000000 1.8299668 0.0000000 0.8566857
## Itupeva 1.7041523 3.280845 2.2437006 1.5842341 2.6048465
## Jarinu 2.8014067 4.835360 1.3215134 0.0000000 1.2796877
## Jundiaí 0.9356347 1.300931 0.9121135 0.3426468 1.3145776
## Louveira 2.1415407 0.000000 1.7115357 0.0000000 2.5031940
## Várzea Paulista 0.3138162 0.000000 2.8522508 0.0000000 3.3777275
## Águas De São Pedro 0.7670632 9.598907 0.2874959 0.0000000 2.5403701
## Analândia 17.2163080 35.907023 0.7169652 0.0000000 0.0000000
## Araras 5.6999939 2.079557 1.4979451 0.6024983 1.6510771
## COMP_F_LQ COMP_G_LQ COMP_H_LQ COMP_I_LQ COMP_J_LQ
## Cabreúva 1.0260217 1.0891761 2.3423920 1.0040926 0.2172296
## Campo Limpo Paulista 1.2801005 1.1818089 0.9906132 1.0449064 0.6098022
## Itupeva 1.0811904 1.0160042 1.3000805 1.1680065 0.3578223
## Jarinu 1.0315655 1.1461284 1.4797439 1.3681483 0.6472196
## Jundiaí 0.9132292 1.0217855 1.1595545 1.0859010 0.7598464
## Louveira 1.1483657 1.2338187 1.9667916 1.2815757 0.3282286
## Várzea Paulista 1.3690679 1.2213244 1.5831099 0.8216207 0.3114907
## Águas De São Pedro 0.5993592 1.5117334 0.3389436 3.0602564 0.4758618
## Analândia 0.7473491 0.7425761 3.6628230 1.9079376 0.3560152
## Araras 1.0084885 1.2015309 1.3168539 0.9944849 0.3546406
## COMP_K_LQ COMP_L_LQ COMP_N_LQ COMP_O_LQ COMP_P_LQ
## Cabreúva 0.7488611 0.7166303 0.4593034 2.9852937 0.5309395
## Campo Limpo Paulista 0.2489432 0.6726458 0.4794005 2.6463956 2.1964410
## Itupeva 0.4672463 1.6160023 0.6656025 3.5762909 1.0777504
## Jarinu 0.2065904 0.4465667 0.7455412 3.9530946 1.4451883
## Jundiaí 0.6265631 1.2506184 0.9136671 0.8701872 1.1196525
## Louveira 0.2155259 0.8735281 0.4157581 5.1550925 0.9932454
## Várzea Paulista 0.1510045 0.3626800 0.5098886 1.6052567 1.0468246
## Águas De São Pedro 0.3280901 0.2216253 0.5842145 15.6949599 0.4652292
## Analândia 0.0000000 1.1053905 0.1942573 19.5702586 0.0000000
## Araras 0.4335829 0.6625941 0.5298950 1.3600952 0.7525629
## COMP_Q_LQ COMP_R_LQ COMP_S_LQ COMP_U_LQ
## Cabreúva 0.8911083 0.3054418 0.8125985 0
## Campo Limpo Paulista 0.6996677 0.6769181 0.7203502 0
## Itupeva 0.3202562 1.0062523 0.9004581 0
## Jarinu 0.5057127 1.1122727 0.4842155 0
## Jundiaí 1.3226651 1.1376518 0.9408820 0
## Louveira 0.4176721 0.8570994 0.7542301 0
## Várzea Paulista 0.3696440 0.4927278 0.6772751 0
## Águas De São Pedro 0.4015659 0.8029187 0.4272176 0
## Analândia 0.3338120 0.0000000 0.3995276 0
## Araras 1.0033692 0.7653734 0.7219263 0
Observation: Based on the content of the lq_cluster_wo_m, it can be seen that the column name name_muni has been removed from the data frame.
In general, multiple variables will be used in cluster analysis. It is usual that their values range are different. In order to avoid the cluster analysis result is baised to clustering variables with large values, it is necessary to standardise the input variables before performing cluster analysis.
This code chunk uses normalize() of heatmaply package to standardize the clustering variables by using Min-Max method. Moreover, summary() is also used to display the summary statstic of the standarized clustering variables.
## COMP_A_LQ COMP_B_LQ COMP_C_LQ COMP_D_LQ
## Min. :0.000000 Min. :0.00000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.006057 1st Qu.:0.00000 1st Qu.:0.1922 1st Qu.:0.0000
## Median :0.027794 Median :0.01434 Median :0.2964 Median :0.0000
## Mean :0.112124 Mean :0.06177 Mean :0.3335 Mean :0.0141
## 3rd Qu.:0.115554 3rd Qu.:0.05508 3rd Qu.:0.4438 3rd Qu.:0.0000
## Max. :1.000000 Max. :1.00000 Max. :1.0000 Max. :1.0000
## COMP_E_LQ COMP_F_LQ COMP_G_LQ COMP_H_LQ
## Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.04312 1st Qu.:0.3288 1st Qu.:0.4829 1st Qu.:0.1765
## Median :0.08449 Median :0.4534 Median :0.5365 Median :0.2500
## Mean :0.10879 Mean :0.4503 Mean :0.5289 Mean :0.2932
## 3rd Qu.:0.14406 3rd Qu.:0.5572 3rd Qu.:0.5960 3rd Qu.:0.3586
## Max. :1.00000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## COMP_I_LQ COMP_J_LQ COMP_K_LQ COMP_L_LQ
## Min. :0.0000 Min. :0.00000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.2037 1st Qu.:0.02648 1st Qu.:0.0952 1st Qu.:0.1478
## Median :0.2356 Median :0.04087 Median :0.1522 Median :0.3109
## Mean :0.2678 Mean :0.05518 Mean :0.1703 Mean :0.3147
## 3rd Qu.:0.2907 3rd Qu.:0.05853 3rd Qu.:0.2238 3rd Qu.:0.4287
## Max. :1.0000 Max. :1.00000 Max. :1.0000 Max. :1.0000
## COMP_N_LQ COMP_O_LQ COMP_P_LQ COMP_Q_LQ
## Min. :0.0000 Min. :0.000000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.1198 1st Qu.:0.009808 1st Qu.:0.3069 1st Qu.:0.2019
## Median :0.1961 Median :0.023043 Median :0.4567 Median :0.3136
## Mean :0.2177 Mean :0.053088 Mean :0.4491 Mean :0.3329
## 3rd Qu.:0.2760 3rd Qu.:0.055327 3rd Qu.:0.6018 3rd Qu.:0.4269
## Max. :1.0000 Max. :1.000000 Max. :1.0000 Max. :1.0000
## COMP_R_LQ COMP_S_LQ COMP_U_LQ
## Min. :0.0000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.3695 1st Qu.:0.2856 1st Qu.:0.00000
## Median :0.4905 Median :0.3757 Median :0.00000
## Mean :0.4645 Mean :0.3790 Mean :0.01048
## 3rd Qu.:0.5985 3rd Qu.:0.4554 3rd Qu.:0.00000
## Max. :1.0000 Max. :1.0000 Max. :1.00000
Observation: After standardization, we can notice that the values range of the Min-max standardized clustering variables are 0-1 now.
In R, many packages provide function to calculate distance matrix. In this case, dist() of R will be used to compute the proximity matrix.
Dist() supports six distance proximity calculations which are euclidean, maximum, manhattan, canberra, binary and minkowski.
This code chunk will use the euclidean distance to compute the proximity matrix.
There are serveral packages provided in R for hierarhical clustering function. In this take-home-assignment03, The hclust() of R stats will be used to compute the hierarchical clustering with the use of ‘ward.D’. The hierarchical clustering output will be stored in an object of class hclust which describes the tree produced by the clustering process.
The code chunk will perform hierarchical cluster analysis using ward.D method.
It is taught by our geospatial professor that, one of the challenges in performing hierarchical clustering is to identify stronger clustering structures. This issues can also be solved by using agnes() function of cluster package which provide agglomerative coefficient to measure the amount of clustering strcuture.
The code chunk will used to compute the agglomerative coefficients of all hierarchical clustering algorithms.
m <- c( "average", "single", "complete", "ward")
names(m) <- c( "average", "single", "complete", "ward")
ac <- function(x) {
agnes(new_lq, method = x)$ac
}
map_dbl(m, ac)## average single complete ward
## 0.9562927 0.9510264 0.9580462 0.9728826
Observation: Based on the output, we can see that ward’s method provides the strongest clustering structure among the other 4 methods since its value 0.9728826 is the closer to 1 compare to the other methods. Therefore, ward’s method will be continued to used for the subsequent analysis.
Another technical challenge face by data analyst in performing clustering analysis is to determine the optimal clusters to retain.
There are three commonly used methods to determine the optimal clusters, they are:
In this take home assignment, Gap Statistic Method will be used to determine the optimal clusters to retain.
The gap statistic compares the total within intra-cluster variation for different values of k with their expected values under null reference distribution of the data. The estimate of the optimal clusters will be value that maximize the gap statistic (i.e., that yields the largest gap statistic). This means that the clustering structure is far away from the random uniform distribution of points.
This code chunk will use clusGap() of cluster package to compute the gap statistic.
set.seed(12345)
gap_stat <- clusGap(new_lq, FUN = hcut, nstart = 25, K.max = 10, B = 50)
# Print the result
print(gap_stat, method = "firstmax")## Clustering Gap statistic ["clusGap"] from call:
## clusGap(x = new_lq, FUNcluster = hcut, K.max = 10, B = 50, nstart = 25)
## B=50 simulated reference sets, k = 1..10; spaceH0="scaledPCA"
## --> Number of clusters (method 'firstmax'): 3
## logW E.logW gap SE.sim
## [1,] 6.832387 8.110970 1.278583 0.01630632
## [2,] 6.563251 7.930109 1.366858 0.02004553
## [3,] 6.423993 7.830754 1.406761 0.01645589
## [4,] 6.340767 7.747398 1.406632 0.01674314
## [5,] 6.291573 7.674303 1.382730 0.01620172
## [6,] 6.129111 7.611438 1.482327 0.01649416
## [7,] 6.075579 7.556450 1.480871 0.01636994
## [8,] 5.998010 7.509528 1.511519 0.01622536
## [9,] 5.903274 7.467097 1.563822 0.01684157
## [10,] 5.850465 7.428672 1.578207 0.01764693
This code chunk will use fviz_gap_stat() of factoextra package to visualize the plot.
Observation: Based on the gap statistic graph, 3 is the recommended number of cluster to choose. It can say that 3 clusters is a logical number of cluster to pick before it increase to next biggest number of cluster which is 6. However, since the data is not very large to use 6 clusters, I will be selecting 3 clusters to perform all analysis.
In the dendrogram, each leaf corresponds to one observation. As we move up the tree, observations that are similar to each other are combined into branches, which are themselves fused at a higher height.
The height of the fusion, provided on the vertical axis, indicates the (dis)similarity between two observations. The higher the height of the fusion, the less similar the observations are. Note that, conclusions about the proximity of two observations can be drawn only based on the height where branches containing those two observations first are fused. We cannot use the proximity of two observations along the horizontal axis as a criteria of their similarity.
It’s also possible to draw the dendrogram with a border around the selected clusters by using rect.hclust() of R stats. The argument border is used to specify the border colors for the rectangles.
Observation: Based on the cluster dendrogram, we can see that municipalities with the relatively same heights are formed into 3 clusters of municipalities together. However, we can say that the cluster size are not evenly spread out since the red cluster contains more municipalities compare to the other two clusters as they are relatively small.
Next, we will perform visually-driven hierarchical clustering analysis by using the heatmaply package since it enable us to build both highly interactive cluster heatmap or static cluster heatmap.
In the code chunk below, the heatmaply() is used to build and plot an interactive cluster heatmap.
heatmaply(normalize(new_lq_mat),
Colv=NA,
dist_method = "euclidean",
hclust_method = "ward.D",
seriate = "OLO",
colors = Blues,
k_row = 3,
margins = c(NA,200,60,NA),
fontsize_row = 4,
fontsize_col = 5,
main="Geographic Segmentation of Industrial Specialization by location quotients",
xlab = "Locatiton Quotients",
ylab = "City"
)Observation: based on the output, we can see that Comp_F_LQ, CompG_LQ, COMP_P_LQ, COMP_Q_LQ and COMP_R_LQ has higher location quotients value. As for the cluste, all 3 clusters contain a few high concentrated industries. For example, the red cluster has high concentration of Industry A (Agriculture, LiveStock, Forestry, Fishing and Aquaculture), Green cluster has high concentration of Industry F (Construction) and Industry P (Education), Blue cluster has high concentration of Industry Q(Health Services). However, the green cluster has more high concentrated industries compared to the other two.
We will use cutree to create a 3 cluster model and save it as a list of factors.
This code chunk will convert the group list objet into a matrix and merge with the spatial dataframe called location_quotient. So that we can plot the map using tmap. After which, rename() of dplyr will be used to rename the matrix groups field as CLUSTER
location_quotient_sf_cluster <- cbind(location_quotient, as.matrix(groups)) %>%
rename(`CLUSTER`=`as.matrix.groups.`)This code chunk will quickly plot the municipalities with respect to their cluster group on the map.
Observation: Based on the map output, we can see that the clusters are dispersed all over the Sau Paulo State. Therefore, it is hard for us to conclude any information from here. However, we can see that among all other cluster, cluster 1 seems to be abit more cluster and located closer to each other compare to other clusters.
In this section, i will be making use of skater apporach to derieve spatially constrained cluster.
We noticed that Ilhabela municipality is disconnected by water source from the rest of Sau Paulo State. Instead of removing it, we will be including a snap of 0.02 so that every polygon will have at least 1 neighbour.
This code chunk will use poly2nd() of spdep package to compute the neighbours list from polygon list.
## Neighbour list object:
## Number of regions: 174
## Number of nonzero links: 882
## Percentage nonzero weights: 2.913199
## Average number of links: 5.068966
## 1 region with no links:
## 100
## Link number distribution:
##
## 0 1 2 3 4 5 6 7 8 9 10 22
## 1 4 13 23 36 31 29 16 9 6 5 1
## 4 least connected regions:
## 8 9 29 36 with 1 link
## 1 most connected region:
## 161 with 22 links
This code chunk will plot the municipalities neighbour list using location_quotient_sp.
plot(location_quotient_sp, border=grey(.5))
plot(location_quotient.nb, coordinates(location_quotient_sp), col="hotpink", add=TRUE)This code chunk will check the name of the municipality with no neighbor.
## [1] "Ilhabela"
Observation: Based on the map output of the municipalities neighbour list, we can see that there is a municipality without neighbour. We also found that the name of municipality is Ilhabela. As it has zero neighbor, we will remove it from the analysis.
This code chunk will remove the municipality named Ilhabela from the dataset. Then compute the neighbour list again.
new_lq_filtered <- new_lq[row.names(new_lq) != "Ilhabela", , drop = FALSE]
location_quotient_sp_filtered <- location_quotient_sp[row.names(new_lq) != "Ilhabela", , drop = FALSE]
location_quotient_sf_cluster <- location_quotient_sf_cluster[location_quotient_sf_cluster$name_muni != "Ilhabela", , drop = FALSE]
location_quotient.nb <- poly2nb(location_quotient_sp_filtered)
summary(location_quotient.nb)## Neighbour list object:
## Number of regions: 173
## Number of nonzero links: 882
## Percentage nonzero weights: 2.946975
## Average number of links: 5.098266
## Link number distribution:
##
## 1 2 3 4 5 6 7 8 9 10 22
## 4 13 23 36 31 29 16 9 6 5 1
## 4 least connected regions:
## 8 9 29 36 with 1 link
## 1 most connected region:
## 161 with 22 links
Observation: Based on the observation, you can see that the municipality with no neighbour is removed and now everyone has aleast 1 neighbour each.
This code chunk will plot the municipalities neighbour list again.
plot(location_quotient_sp_filtered, border=grey(.5))
plot(location_quotient.nb, coordinates(location_quotient_sp_filtered), col="hotpink", add=TRUE)Observation: Based on the map output of the neighbour list, we can now see that the Ilhabela municipality have been removed thus, there is no municipality without a neighbour and all municipalities are well-connected now.
Next, nbcosts() of spdep package is used to compute the cost of each edge. It is the distance between it nodes. This function compute this distance using a data.frame with observations vector in each node.
The code chunk used nbcost() of spdep package to compute the cost of each edge.
This code chunk uses nb2listw() of spdep package to incorporate these costs into a weight object using style B to make sure the cost values are not row-standardized.
location_quotient.w <- nb2listw(location_quotient.nb, lcosts, style="B")
summary(location_quotient.w)## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 173
## Number of nonzero links: 882
## Percentage nonzero weights: 2.946975
## Average number of links: 5.098266
## Link number distribution:
##
## 1 2 3 4 5 6 7 8 9 10 22
## 4 13 23 36 31 29 16 9 6 5 1
## 4 least connected regions:
## 8 9 29 36 with 1 link
## 1 most connected region:
## 161 with 22 links
##
## Weights style: B
## Weights constants summary:
## n nn S0 S1 S2
## B 173 29929 13813.92 999945.6 8183825
This code chunk used mstree() of spdep package to compute the mean of minimum spanning tree. This is to ensure that all municipality are interconnected.
This code chunk will check if the class of the computed MST is matrix or not. This is to ensure that the conversion was correct.
## [1] "mst" "matrix"
Observation: Based on the output, we can see that the class of location_quotient.mst is matrix. Hence, the conversion was correct.
This code chunk will check the dimension of computed MST. In this case, the resulted dimension should be n-1 number of observations. This is because the minimum spanning tree consists on n-1 edges (links) in order to traverse all the nodes..
## [1] 172 3
Observation: Based on the output result, we can see that the dimension of computed MST is correct. It is because we have initially 174 observations in the location_quotient. However, after we remove ‘Ilhabela’, it should be 173. And the minimum spanning tree consists of n - 1 edges. Therefore, the dimension of 172 means the spanning tree is correctly computed.
This code chunk plot the minimum spanning tree to see how the initial neighbour list is simplied to just one edge connecting to each of the nodes while passing passing through all other nodes.
plot(location_quotient_sp_filtered, border=gray(.5))
plot.mst(location_quotient.mst, coordinates(location_quotient_sp_filtered),
col="hotpink", cex.lab=0.7, cex.circles=0.005, add=TRUE)The skater() method takes three arguments:
This code chunk uses skater() of spdep package to compute the spatially constrained cluster.
clust3 <- skater(location_quotient.mst[,1:2], new_lq_filtered, method = "euclidean", 2)
class(clust3)## [1] "skater"
Obsevation: The result of the skater() is an object of class skater.
This code chunk will examine the content of clust6.
## List of 8
## $ groups : num [1:173] 1 1 1 1 1 1 1 1 1 1 ...
## $ edges.groups:List of 3
## ..$ :List of 3
## .. ..$ node: num [1:165] 121 92 111 23 104 151 55 25 154 40 ...
## .. ..$ edge: num [1:164, 1:3] 111 23 55 25 40 151 121 92 154 147 ...
## .. ..$ ssw : num 2097
## ..$ :List of 3
## .. ..$ node: num [1:7] 97 117 119 88 102 106 90
## .. ..$ edge: num [1:6, 1:3] 117 97 97 119 117 119 88 117 119 102 ...
## .. ..$ ssw : num 153
## ..$ :List of 3
## .. ..$ node: num 89
## .. ..$ edge: num[0 , 1:3]
## .. ..$ ssw : num 0
## $ not.prune : NULL
## $ candidates : int [1:3] 1 2 3
## $ ssto : num 2654
## $ ssw : num [1:3] 2654 2405 2250
## $ crit : num [1:2] 1 Inf
## $ vec.crit : num [1:173] 1 1 1 1 1 1 1 1 1 1 ...
## - attr(*, "class")= chr "skater"
This code chunk will check the cluster assignment.
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 2 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 2 1 1 1 1 1
## [112] 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
This code chunk will check the number of observation in each cluster by the means of table command.
## ccs3
## 1 2 3
## 165 7 1
This code chunk will plot the pruned tree that shows the 5 clusters across municipality area.
plot(location_quotient_sp_filtered, border=gray(.5))
plot(clust3, coordinates(location_quotient_sp_filtered), cex.lab=.7,
groups.colors=c("red","green","blue", "brown", "pink"), cex.circles=0.005, add=TRUE)## Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
## coords[id2, : "add" is not a graphical parameter
## Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
## coords[id2, : "add" is not a graphical parameter
The code chunk will visualize the derived cluster based on SKATER clustering onto a choropleth map.
groups_mat <- as.matrix(clust3$groups)
location_quotient_sf_sptialcluster <- cbind(location_quotient_sf_cluster, as.factor(groups_mat)) %>%
rename(`SP_CLUSTER`=`as.factor.groups_mat.`)
qtm(location_quotient_sf_sptialcluster, "SP_CLUSTER") Observation: Based on the output of the map resulted from skater method, we can see that clusters are not as fragmented as hierarchical cluster method. Instead, the clusters are grouping together together to provide more homogeneity. Thus, it is easier to differentiate between clusters. We can also noticed that the cluster 1 contains the most number of municipalities, whereas cluster 2 and 3 contains lesser municipal within the cluster. Additionally, cluster 3 only contains 1 municipal. This proves that the 3 clusters is still not optimal for this dataset.
This code chunk display the spatially constrained clustering hierarchy method and skater method side by side for comparison
hier_clust_map <- qtm(location_quotient_sf_cluster,
"CLUSTER") +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "Hierarchy Method",
main.title.position = "center",
legend.show = TRUE,
legend.text.size = 0.6,
frame = TRUE)
skater_clust_map <- qtm(location_quotient_sf_sptialcluster,
"SP_CLUSTER") +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "Skater Method",
main.title.position = "center",
legend.show = TRUE,
legend.text.size =0.6,
frame = TRUE)
tmap_arrange(hier_clust_map, skater_clust_map,
asp=0, ncol=2)## Warning: One tm layer group has duplicated layer types, which are omitted. To
## draw multiple layers of the same type, use multiple layer groups (i.e. specify
## tm_shape prior to each of them).
## Warning: One tm layer group has duplicated layer types, which are omitted. To
## draw multiple layers of the same type, use multiple layer groups (i.e. specify
## tm_shape prior to each of them).
Observation: Based on the two maps, we can say that there are of very different. The cluster in Hierarchy method is very fragmented whereas the cluster in skater method is clustered together with the high homogeneity. However, one thing to take note is that, even though the municipals belong to the cluster changes for the clusters in two different methods, the cluster 3 of hierarchy method and cluster 2 of skater method remain the same.