Brazil is the world’s fifth-largest country by area and the sixth most populous. Brazil is classified as an upper-middle income economy by the World Bank and a developing country, with the largest share of global wealth in Latin America. It is considered an advanced emerging economy. It has the ninth largest GDP in the world by nominal, and eighth by PPP measures. Behind all this impressive figures, the spatial development of Brazil is highly unequal as shown in the figure below. The GDP per capita of the poorest municipality is R$3190.6. On the other hand, the GDP per capita of the richest municipality is R$314638. Half of the municipalities with GDP per capita less than R$16000 and the top 25% municipalities earn R$26155 and above. This assignment determine factors affecting the unequal development of Brazil at the municipality level by using the data provided.
BRAZIL_CITIES.csv. This data file consists of 81 columns and 5573 rows. Each row representing one municipality.
Data_Dictionary.csv. This file provides meta data of each columns in BRAZIL_CITIES.csv.
2016 municipality boundary file from geobr
packages = c('olsrr', 'corrplot', 'ggpubr', 'sf', 'spdep', 'GWmodel', 'tmap', 'tidyverse','geobr','cartography','spatstat')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
## Loading required package: olsrr
##
## Attaching package: 'olsrr'
## The following object is masked from 'package:datasets':
##
## rivers
## Loading required package: corrplot
## corrplot 0.84 loaded
## Loading required package: ggpubr
## Loading required package: ggplot2
## Loading required package: sf
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
## Loading required package: spdep
## Loading required package: sp
## Loading required package: spData
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`
## Loading required package: GWmodel
## Loading required package: maptools
## Checking rgeos availability: TRUE
## Loading required package: robustbase
## Loading required package: Rcpp
## Loading required package: spatialreg
## Loading required package: Matrix
## Registered S3 methods overwritten by 'spatialreg':
## method from
## residuals.stsls spdep
## deviance.stsls spdep
## coef.stsls spdep
## print.stsls spdep
## summary.stsls spdep
## print.summary.stsls spdep
## residuals.gmsar spdep
## deviance.gmsar spdep
## coef.gmsar spdep
## fitted.gmsar spdep
## print.gmsar spdep
## summary.gmsar spdep
## print.summary.gmsar spdep
## print.lagmess spdep
## summary.lagmess spdep
## print.summary.lagmess spdep
## residuals.lagmess spdep
## deviance.lagmess spdep
## coef.lagmess spdep
## fitted.lagmess spdep
## logLik.lagmess spdep
## fitted.SFResult spdep
## print.SFResult spdep
## fitted.ME_res spdep
## print.ME_res spdep
## print.lagImpact spdep
## plot.lagImpact spdep
## summary.lagImpact spdep
## HPDinterval.lagImpact spdep
## print.summary.lagImpact spdep
## print.sarlm spdep
## summary.sarlm spdep
## residuals.sarlm spdep
## deviance.sarlm spdep
## coef.sarlm spdep
## vcov.sarlm spdep
## fitted.sarlm spdep
## logLik.sarlm spdep
## anova.sarlm spdep
## predict.sarlm spdep
## print.summary.sarlm spdep
## print.sarlm.pred spdep
## as.data.frame.sarlm.pred spdep
## residuals.spautolm spdep
## deviance.spautolm spdep
## coef.spautolm spdep
## fitted.spautolm spdep
## print.spautolm spdep
## summary.spautolm spdep
## logLik.spautolm spdep
## print.summary.spautolm spdep
## print.WXImpact spdep
## summary.WXImpact spdep
## print.summary.WXImpact spdep
## predict.SLX spdep
##
## Attaching package: 'spatialreg'
## The following objects are masked from 'package:spdep':
##
## anova.sarlm, as.spam.listw, as_dgRMatrix_listw, as_dsCMatrix_I,
## as_dsCMatrix_IrW, as_dsTMatrix_listw, bptest.sarlm, can.be.simmed,
## cheb_setup, coef.gmsar, coef.sarlm, coef.spautolm, coef.stsls,
## create_WX, deviance.gmsar, deviance.sarlm, deviance.spautolm,
## deviance.stsls, do_ldet, eigen_pre_setup, eigen_setup, eigenw,
## errorsarlm, fitted.gmsar, fitted.ME_res, fitted.sarlm,
## fitted.SFResult, fitted.spautolm, get.ClusterOption,
## get.coresOption, get.mcOption, get.VerboseOption,
## get.ZeroPolicyOption, GMargminImage, GMerrorsar, griffith_sone,
## gstsls, Hausman.test, HPDinterval.lagImpact, impacts, intImpacts,
## Jacobian_W, jacobianSetup, l_max, lagmess, lagsarlm, lextrB,
## lextrS, lextrW, lmSLX, logLik.sarlm, logLik.spautolm, LR.sarlm,
## LR1.sarlm, LR1.spautolm, LU_prepermutate_setup, LU_setup,
## Matrix_J_setup, Matrix_setup, mcdet_setup, MCMCsamp, ME, mom_calc,
## mom_calc_int2, moments_setup, powerWeights, predict.sarlm,
## predict.SLX, print.gmsar, print.ME_res, print.sarlm,
## print.sarlm.pred, print.SFResult, print.spautolm, print.stsls,
## print.summary.gmsar, print.summary.sarlm, print.summary.spautolm,
## print.summary.stsls, residuals.gmsar, residuals.sarlm,
## residuals.spautolm, residuals.stsls, sacsarlm, SE_classic_setup,
## SE_interp_setup, SE_whichMin_setup, set.ClusterOption,
## set.coresOption, set.mcOption, set.VerboseOption,
## set.ZeroPolicyOption, similar.listw, spam_setup, spam_update_setup,
## SpatialFiltering, spautolm, spBreg_err, spBreg_lag, spBreg_sac,
## stsls, subgraph_eigenw, summary.gmsar, summary.sarlm,
## summary.spautolm, summary.stsls, trW, vcov.sarlm, Wald1.sarlm
## Welcome to GWmodel version 2.1-4.
## The new version of GWmodel 2.1-4 now is readyLoading required package: tmap
## Loading required package: tidyverse
## -- Attaching packages --------- tidyverse 1.3.0 --
## v tibble 3.0.1 v dplyr 0.8.5
## v tidyr 1.0.3 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## v purrr 0.3.4
## -- Conflicts ------------ tidyverse_conflicts() --
## x tidyr::expand() masks Matrix::expand()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## x tidyr::pack() masks Matrix::pack()
## x tidyr::unpack() masks Matrix::unpack()
## Loading required package: geobr
## Loading required package: cartography
## Loading required package: spatstat
## Loading required package: spatstat.data
## Loading required package: nlme
##
## Attaching package: 'nlme'
##
## The following object is masked from 'package:dplyr':
##
## collapse
##
## Loading required package: rpart
## Registered S3 method overwritten by 'spatstat':
## method from
## print.boxx cli
##
## spatstat 1.64-1 (nickname: 'Help you I can, yes!')
## For an introduction to spatstat, type 'beginner'
##
##
## Attaching package: 'spatstat'
##
## The following objects are masked from 'package:ggpubr':
##
## border, rotate
dictionary <- read_csv2("data/aspatial/Data_Dictionary.csv")
## Using ',' as decimal and '.' as grouping mark. Use read_delim() for more control.
## Warning: Missing column names filled in: 'X6' [6]
## Parsed with column specification:
## cols(
## FIELD = col_character(),
## DESCRIPTION = col_character(),
## REFERENCE = col_character(),
## UNIT = col_character(),
## SOURCE = col_character(),
## X6 = col_character()
## )
NA Check
sum(is.na(dictionary))
## [1] 176
We will not omit theese values as the dictionary is used for reference . Despite this , we will use this dataframe with caution.
brazil <- read_delim("data/aspatial/BRAZIL_CITIES.csv", delim = ";")
## Parsed with column specification:
## cols(
## .default = col_double(),
## CITY = col_character(),
## STATE = col_character(),
## AREA = col_number(),
## REGIAO_TUR = col_character(),
## CATEGORIA_TUR = col_character(),
## RURAL_URBAN = col_character(),
## GVA_MAIN = col_character()
## )
## See spec(...) for full column specifications.
glimpse(brazil)
## Rows: 5,573
## Columns: 81
## $ CITY <chr> "Abadia De Goiás", "Abadia Dos Dourados", ...
## $ STATE <chr> "GO", "MG", "GO", "MG", "PA", "CE", "BA", ...
## $ CAPITAL <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ IBGE_RES_POP <dbl> 6876, 6704, 15757, 22690, 141100, 10496, 8...
## $ IBGE_RES_POP_BRAS <dbl> 6876, 6704, 15609, 22690, 141040, 10496, 8...
## $ IBGE_RES_POP_ESTR <dbl> 0, 0, 148, 0, 60, 0, 0, 0, 0, 0, 0, 16, 17...
## $ IBGE_DU <dbl> 2137, 2328, 4655, 7694, 31061, 2791, 2572,...
## $ IBGE_DU_URBAN <dbl> 1546, 1481, 3233, 6667, 19057, 1251, 1193,...
## $ IBGE_DU_RURAL <dbl> 591, 847, 1422, 1027, 12004, 1540, 1379, 1...
## $ IBGE_POP <dbl> 5300, 4154, 10656, 18464, 82956, 4538, 372...
## $ IBGE_1 <dbl> 69, 38, 139, 176, 1354, 98, 37, 167, 69, 1...
## $ `IBGE_1-4` <dbl> 318, 207, 650, 856, 5567, 323, 156, 733, 3...
## $ `IBGE_5-9` <dbl> 438, 260, 894, 1233, 7618, 421, 263, 978, ...
## $ `IBGE_10-14` <dbl> 517, 351, 1087, 1539, 8905, 483, 277, 927,...
## $ `IBGE_15-59` <dbl> 3542, 2709, 6896, 11979, 53516, 2631, 2319...
## $ `IBGE_60+` <dbl> 416, 589, 990, 2681, 5996, 582, 673, 803, ...
## $ IBGE_PLANTED_AREA <dbl> 319, 4479, 10307, 1862, 25200, 2598, 895, ...
## $ `IBGE_CROP_PRODUCTION_$` <dbl> 1843, 18017, 33085, 7502, 700872, 5234, 39...
## $ `IDHM Ranking 2010` <dbl> 1689, 2207, 2202, 1994, 3530, 3522, 4086, ...
## $ IDHM <dbl> 0.708, 0.690, 0.690, 0.698, 0.628, 0.628, ...
## $ IDHM_Renda <dbl> 0.687, 0.693, 0.671, 0.720, 0.579, 0.540, ...
## $ IDHM_Longevidade <dbl> 0.830, 0.839, 0.841, 0.848, 0.798, 0.748, ...
## $ IDHM_Educacao <dbl> 0.622, 0.563, 0.579, 0.556, 0.537, 0.612, ...
## $ LONG <dbl> -49.44055, -47.39683, -48.71881, -45.44619...
## $ LAT <dbl> -16.758812, -18.487565, -16.182672, -19.15...
## $ ALT <dbl> 893.60, 753.12, 1017.55, 644.74, 10.12, 40...
## $ PAY_TV <dbl> 360, 77, 227, 1230, 3389, 29, 952, 51, 55,...
## $ FIXED_PHONES <dbl> 842, 296, 720, 1716, 1218, 34, 335, 222, 3...
## $ AREA <dbl> 147.26, 881.06, 1045.13, 1817.07, 1610.65,...
## $ REGIAO_TUR <chr> NA, "Caminhos Do Cerrado", "Região Turísti...
## $ CATEGORIA_TUR <chr> NA, "D", "C", "D", "D", NA, "D", NA, NA, "...
## $ ESTIMATED_POP <dbl> 8583, 6972, 19614, 23223, 156292, 11663, 8...
## $ RURAL_URBAN <chr> "Urbano", "Rural Adjacente", "Rural Adjace...
## $ GVA_AGROPEC <dbl> 6.20, 50524.57, 42.84, 113824.60, 140463.7...
## $ GVA_INDUSTRY <dbl> 27991.25, 25917.70, 16728.30, 31002.62, 58...
## $ GVA_SERVICES <dbl> 74750.32, 62689.23, 138198.58, 172.33, 468...
## $ GVA_PUBLIC <dbl> 36915.04, 28083.79, 63396.20, 86081.41, 48...
## $ ` GVA_TOTAL ` <dbl> 145857.60, 167215.28, 261161.91, 403241.27...
## $ TAXES <dbl> 20554.20, 12873.50, 26822.58, 26994.09, 95...
## $ GDP <dbl> 166.41, 180.09, 287984.49, 430235.36, 1249...
## $ POP_GDP <dbl> 8053, 7037, 18427, 23574, 151934, 11483, 9...
## $ GDP_CAPITA <dbl> 20664.57, 25591.70, 15628.40, 18250.42, 82...
## $ GVA_MAIN <chr> "Demais serviços", "Demais serviços", "Dem...
## $ MUN_EXPENDIT <dbl> 28227691, 17909274, 37513019, NA, NA, NA, ...
## $ COMP_TOT <dbl> 284, 476, 288, 621, 931, 86, 191, 87, 285,...
## $ COMP_A <dbl> 5, 6, 5, 18, 4, 1, 6, 2, 5, 2, 0, 8, 3, 1,...
## $ COMP_B <dbl> 1, 6, 9, 1, 2, 0, 0, 0, 0, 0, 0, 2, 2, 0, ...
## $ COMP_C <dbl> 56, 30, 26, 40, 43, 4, 8, 3, 20, 4, 9, 40,...
## $ COMP_D <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ...
## $ COMP_E <dbl> 2, 2, 2, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 2, ...
## $ COMP_F <dbl> 29, 34, 7, 20, 27, 6, 4, 0, 10, 2, 0, 25, ...
## $ COMP_G <dbl> 110, 190, 117, 303, 500, 48, 97, 71, 133, ...
## $ COMP_H <dbl> 26, 70, 12, 62, 16, 2, 5, 0, 18, 8, 1, 67,...
## $ COMP_I <dbl> 4, 28, 57, 30, 31, 10, 5, 1, 14, 3, 0, 25,...
## $ COMP_J <dbl> 5, 11, 2, 9, 6, 2, 3, 1, 8, 1, 1, 9, 5, 14...
## $ COMP_K <dbl> 0, 0, 1, 6, 1, 0, 1, 0, 0, 1, 0, 4, 3, 3, ...
## $ COMP_L <dbl> 2, 4, 0, 4, 1, 0, 0, 0, 4, 0, 0, 7, 4, 4, ...
## $ COMP_M <dbl> 10, 15, 7, 28, 22, 2, 5, 0, 11, 4, 2, 26, ...
## $ COMP_N <dbl> 12, 29, 15, 27, 16, 3, 5, 1, 26, 0, 1, 16,...
## $ COMP_O <dbl> 4, 2, 3, 2, 2, 2, 2, 2, 2, 2, 6, 2, 4, 2, ...
## $ COMP_P <dbl> 6, 9, 11, 15, 155, 0, 8, 0, 8, 1, 6, 14, 1...
## $ COMP_Q <dbl> 6, 14, 5, 19, 33, 2, 1, 2, 9, 3, 0, 13, 22...
## $ COMP_R <dbl> 1, 6, 1, 9, 15, 0, 2, 0, 4, 0, 0, 4, 6, 6,...
## $ COMP_S <dbl> 5, 19, 8, 27, 56, 4, 38, 4, 12, 3, 4, 23, ...
## $ COMP_T <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ COMP_U <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ HOTELS <dbl> NA, NA, 1, NA, NA, NA, 1, NA, NA, NA, NA, ...
## $ BEDS <dbl> NA, NA, 34, NA, NA, NA, 24, NA, NA, NA, NA...
## $ Pr_Agencies <dbl> NA, NA, 1, 2, 2, NA, NA, 1, 0, 0, 0, 1, 0,...
## $ Pu_Agencies <dbl> NA, NA, 1, 2, 4, NA, NA, 0, 1, 1, 1, 2, 1,...
## $ Pr_Bank <dbl> NA, NA, 1, 2, 2, NA, NA, 1, 0, 0, 0, 1, 0,...
## $ Pu_Bank <dbl> NA, NA, 1, 2, 4, NA, NA, 0, 1, 1, 1, 2, 1,...
## $ Pr_Assets <dbl> NA, NA, 33724584, 44974716, 76181384, NA, ...
## $ Pu_Assets <dbl> NA, NA, 67091904, 371922572, 800078483, NA...
## $ Cars <dbl> 2158, 2227, 2838, 6928, 5277, 553, 896, 61...
## $ Motorcycles <dbl> 1246, 1142, 1426, 2953, 25661, 1674, 696, ...
## $ Wheeled_tractor <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 0, 0, ...
## $ UBER <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ MAC <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ `WAL-MART` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ POST_OFFICES <dbl> 1, 1, 3, 4, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, ...
Rows that have NA GDP_Capita values
brazil[!complete.cases(brazil$GDP_CAPITA), ]
## # A tibble: 3 x 81
## CITY STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~ IBGE_DU
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Lago~ RS 0 NA NA NA NA
## 2 Sant~ BA 0 NA NA NA NA
## 3 São ~ PE 0 NA NA NA NA
## # ... with 74 more variables: IBGE_DU_URBAN <dbl>, IBGE_DU_RURAL <dbl>,
## # IBGE_POP <dbl>, IBGE_1 <dbl>, `IBGE_1-4` <dbl>, `IBGE_5-9` <dbl>,
## # `IBGE_10-14` <dbl>, `IBGE_15-59` <dbl>, `IBGE_60+` <dbl>,
## # IBGE_PLANTED_AREA <dbl>, `IBGE_CROP_PRODUCTION_$` <dbl>, `IDHM Ranking
## # 2010` <dbl>, IDHM <dbl>, IDHM_Renda <dbl>, IDHM_Longevidade <dbl>,
## # IDHM_Educacao <dbl>, LONG <dbl>, LAT <dbl>, ALT <dbl>, PAY_TV <dbl>,
## # FIXED_PHONES <dbl>, AREA <dbl>, REGIAO_TUR <chr>, CATEGORIA_TUR <chr>,
## # ESTIMATED_POP <dbl>, RURAL_URBAN <chr>, GVA_AGROPEC <dbl>,
## # GVA_INDUSTRY <dbl>, GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, ` GVA_TOTAL
## # ` <dbl>, TAXES <dbl>, GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>,
## # GVA_MAIN <chr>, MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>,
## # COMP_B <dbl>, COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>,
## # COMP_G <dbl>, COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>,
## # COMP_L <dbl>, COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>,
## # COMP_Q <dbl>, COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>,
## # HOTELS <dbl>, BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>,
## # Pr_Bank <dbl>, Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>, Cars <dbl>,
## # Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <dbl>, MAC <dbl>,
## # `WAL-MART` <dbl>, POST_OFFICES <dbl>
We will not be omiting this rows yet as GDP_CAPITA is the dependent variable of our analysis and more investigation will be done later on.
Rows that do not have LAT LONG data
brazil[!complete.cases(brazil$LAT), ]
## # A tibble: 9 x 81
## CITY STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~ IBGE_DU
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Baln~ SC 0 NA NA NA NA
## 2 Lago~ RS 0 NA NA NA NA
## 3 Moju~ PA 0 NA NA NA NA
## 4 Para~ MS 0 NA NA NA NA
## 5 Pesc~ SC 0 NA NA NA NA
## 6 Pinh~ RS 0 2130 2130 0 745
## 7 Pint~ RS 0 NA NA NA NA
## 8 Sant~ BA 0 9648 9648 0 2891
## 9 São ~ PE 0 NA NA NA NA
## # ... with 74 more variables: IBGE_DU_URBAN <dbl>, IBGE_DU_RURAL <dbl>,
## # IBGE_POP <dbl>, IBGE_1 <dbl>, `IBGE_1-4` <dbl>, `IBGE_5-9` <dbl>,
## # `IBGE_10-14` <dbl>, `IBGE_15-59` <dbl>, `IBGE_60+` <dbl>,
## # IBGE_PLANTED_AREA <dbl>, `IBGE_CROP_PRODUCTION_$` <dbl>, `IDHM Ranking
## # 2010` <dbl>, IDHM <dbl>, IDHM_Renda <dbl>, IDHM_Longevidade <dbl>,
## # IDHM_Educacao <dbl>, LONG <dbl>, LAT <dbl>, ALT <dbl>, PAY_TV <dbl>,
## # FIXED_PHONES <dbl>, AREA <dbl>, REGIAO_TUR <chr>, CATEGORIA_TUR <chr>,
## # ESTIMATED_POP <dbl>, RURAL_URBAN <chr>, GVA_AGROPEC <dbl>,
## # GVA_INDUSTRY <dbl>, GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, ` GVA_TOTAL
## # ` <dbl>, TAXES <dbl>, GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>,
## # GVA_MAIN <chr>, MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>,
## # COMP_B <dbl>, COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>,
## # COMP_G <dbl>, COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>,
## # COMP_L <dbl>, COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>,
## # COMP_Q <dbl>, COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>,
## # HOTELS <dbl>, BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>,
## # Pr_Bank <dbl>, Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>, Cars <dbl>,
## # Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <dbl>, MAC <dbl>,
## # `WAL-MART` <dbl>, POST_OFFICES <dbl>
For these rows, I will be inputting the data manually later on in the analysis, I will filter the valid cities with LAT LONG data and invalid cities without
valid <- brazil %>% filter(!is.na(LAT))
invalid <- brazil %>% filter(is.na(LAT))
convert to sf data frame
valid <- st_as_sf(valid,
coords = c("LONG", "LAT"),
crs=4674)
Brazil 2016 municipality boundary
mun <- read_municipality(year=2016)
## Using year 2016
## Loading data for the whole country. This might take a few minutes.
##
|
| | 0%
|
|=== | 4%
|
|===== | 7%
|
|======== | 11%
|
|========== | 15%
|
|============= | 19%
|
|================ | 22%
|
|================== | 26%
|
|===================== | 30%
|
|======================= | 33%
|
|========================== | 37%
|
|============================= | 41%
|
|=============================== | 44%
|
|================================== | 48%
|
|==================================== | 52%
|
|======================================= | 56%
|
|========================================= | 59%
|
|============================================ | 63%
|
|=============================================== | 67%
|
|================================================= | 70%
|
|==================================================== | 74%
|
|====================================================== | 78%
|
|========================================================= | 81%
|
|============================================================ | 85%
|
|============================================================== | 89%
|
|================================================================= | 93%
|
|=================================================================== | 96%
|
|======================================================================| 100%
st_crs(mun)
## Coordinate Reference System:
## User input: SIRGAS 2000
## wkt:
## GEOGCRS["SIRGAS 2000",
## DATUM["Sistema de Referencia Geocentrico para las AmericaS 2000",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## USAGE[
## SCOPE["unknown"],
## AREA["Latin America - SIRGAS 2000 by country"],
## BBOX[-59.87,-122.19,32.72,-25.28]],
## ID["EPSG",4674]]
all(st_is_valid(mun))
## [1] FALSE
muni_invalid <- mun %>% filter(!st_is_valid(mun))
st_is_valid(muni_invalid, reason = TRUE)
## [1] "Nested shells[-46.999541435 -24.3495746469999]"
mun <- st_make_valid(mun)
checking for invalid polygons again
check <- function(target_st) {
validity <- st_is_valid(target_st)
NA_rows <- target_st[rowSums(is.na(target_st))!=0,]
Invalid_rows <- which(validity==FALSE)
print(paste("For:", deparse(substitute(target_st))))
print(paste("Number of Invalid polygons:", length(Invalid_rows)))
}
check(mun)
## [1] "For: mun"
## [1] "Number of Invalid polygons: 0"
Checking the municipality unique identifier
length(unique(mun$name_muni))
## [1] 5299
length(unique(mun$code_muni))
## [1] 5572
From the results , we can see that there are some municipality with the same names. We should use code_muni as the unique identifier.
using st_area function to obtain area of municipility
mun <- mun %>%
mutate(area = st_area(mun))
merged brazil cities to municipal polygon
brazil_mun <- st_join(valid,mun,join=st_within)
## although coordinates are longitude/latitude, st_within assumes that they are planar
By joining the data, the missing GDP_CAPITA is addressed
merged municipal polygon to brazil cities
mun_brazil <- st_join(mun,valid)
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
For rows that did not have a successful join , we wil filter them out and anaylse them . We will also be verifying the successful joins
success <- mun_brazil %>% filter(!is.na(CITY))
unsucess <- mun_brazil %>% filter(is.na(CITY)) %>% select(code_muni, name_muni, code_state, abbrev_state, geom)
For verifying the successfull joins , the CITY variable from our brazil dataframe is supposed to tally with name_muni from the municipility data from geobr and there should be no rows with dfferent city names
different <- success %>% filter(name_muni != CITY) %>% select(name_muni, CITY, abbrev_state, STATE)
print(different)
## Simple feature collection with 48 features and 4 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: -63.19642 ymin: -31.33209 xmax: -35.54424 ymax: -3.917303
## geographic CRS: SIRGAS 2000
## First 10 features:
## name_muni CITY abbrev_state STATE
## 1 Alta Floresta D'oeste Alta Floresta D'Oeste RO RO
## 2 Espigão D'oeste Espigão D'Oeste RO RO
## 3 Machadinho D'oeste Machadinho D'Oeste RO RO
## 4 Nova Brasilândia D'oeste Nova Brasilândia D'Oeste RO RO
## 5 Santa Luzia D'oeste Santa Luzia D'Oeste RO RO
## 6 Alvorada D'oeste Alvorada D'Oeste RO RO
## 7 São Felipe D'oeste São Felipe D'Oeste RO RO
## 8 Pau D'arco Pau D'Arco PA PA
## 9 Pau D'arco Pau D'Arco TO TO
## 10 Olho D'água Das Cunhãs Olho D'Água Das Cunhãs MA MA
## geom
## 1 POLYGON ((-62.19465 -11.827...
## 2 POLYGON ((-60.94827 -10.988...
## 3 POLYGON ((-62.22055 -8.5908...
## 4 POLYGON ((-61.9569 -11.2959...
## 5 POLYGON ((-61.53069 -11.856...
## 6 POLYGON ((-62.18776 -11.068...
## 7 POLYGON ((-61.53458 -11.743...
## 8 MULTIPOLYGON (((-50.3382 -7...
## 9 POLYGON ((-49.23783 -7.4883...
## 10 MULTIPOLYGON (((-45.07365 -...
From the output , the cities have the same name and state .Hence , the join was successful.
Next , we wll view that unsuccessful joins and the invalid brazilian cities data
unsucess
## Simple feature collection with 8 features and 4 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: -54.83431 ymin: -33.61653 xmax: -48.81431 ymax: -2.593611
## geographic CRS: SIRGAS 2000
## code_muni name_muni code_state abbrev_state
## 1 1504752 Mojuí Dos Campos 15 PA
## 2 4212650 Pescaria Brava 42 SC
## 3 4220000 Balneário Rincão 42 SC
## 4 4300001 Lagoa Mirim 43 RS
## 5 4300002 Lagoa Dos Patos 43 RS
## 6 4314464 Pinhal Da Serra 43 RS
## 7 4314548 Pinto Bandeira 43 RS
## 8 5006275 Paraíso Das Águas 50 MS
## geom
## 1 MULTIPOLYGON (((-54.58777 -...
## 2 MULTIPOLYGON (((-48.92608 -...
## 3 MULTIPOLYGON (((-49.20933 -...
## 4 POLYGON ((-52.62241 -32.146...
## 5 POLYGON ((-51.2719 -30.0389...
## 6 POLYGON ((-51.29481 -27.725...
## 7 POLYGON ((-51.45725 -29.029...
## 8 POLYGON ((-52.90027 -18.782...
invalid
## # A tibble: 9 x 81
## CITY STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~ IBGE_DU
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Baln~ SC 0 NA NA NA NA
## 2 Lago~ RS 0 NA NA NA NA
## 3 Moju~ PA 0 NA NA NA NA
## 4 Para~ MS 0 NA NA NA NA
## 5 Pesc~ SC 0 NA NA NA NA
## 6 Pinh~ RS 0 2130 2130 0 745
## 7 Pint~ RS 0 NA NA NA NA
## 8 Sant~ BA 0 9648 9648 0 2891
## 9 São ~ PE 0 NA NA NA NA
## # ... with 74 more variables: IBGE_DU_URBAN <dbl>, IBGE_DU_RURAL <dbl>,
## # IBGE_POP <dbl>, IBGE_1 <dbl>, `IBGE_1-4` <dbl>, `IBGE_5-9` <dbl>,
## # `IBGE_10-14` <dbl>, `IBGE_15-59` <dbl>, `IBGE_60+` <dbl>,
## # IBGE_PLANTED_AREA <dbl>, `IBGE_CROP_PRODUCTION_$` <dbl>, `IDHM Ranking
## # 2010` <dbl>, IDHM <dbl>, IDHM_Renda <dbl>, IDHM_Longevidade <dbl>,
## # IDHM_Educacao <dbl>, LONG <dbl>, LAT <dbl>, ALT <dbl>, PAY_TV <dbl>,
## # FIXED_PHONES <dbl>, AREA <dbl>, REGIAO_TUR <chr>, CATEGORIA_TUR <chr>,
## # ESTIMATED_POP <dbl>, RURAL_URBAN <chr>, GVA_AGROPEC <dbl>,
## # GVA_INDUSTRY <dbl>, GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, ` GVA_TOTAL
## # ` <dbl>, TAXES <dbl>, GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>,
## # GVA_MAIN <chr>, MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>,
## # COMP_B <dbl>, COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>,
## # COMP_G <dbl>, COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>,
## # COMP_L <dbl>, COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>,
## # COMP_Q <dbl>, COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>,
## # HOTELS <dbl>, BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>,
## # Pr_Bank <dbl>, Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>, Cars <dbl>,
## # Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <dbl>, MAC <dbl>,
## # `WAL-MART` <dbl>, POST_OFFICES <dbl>
When comparing the 2 outputs , there are 3 findings to take note.
sao <- invalid%>%
filter(CITY == "São Caetano")
sao
## # A tibble: 1 x 81
## CITY STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~ IBGE_DU
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 São ~ PE 0 NA NA NA NA
## # ... with 74 more variables: IBGE_DU_URBAN <dbl>, IBGE_DU_RURAL <dbl>,
## # IBGE_POP <dbl>, IBGE_1 <dbl>, `IBGE_1-4` <dbl>, `IBGE_5-9` <dbl>,
## # `IBGE_10-14` <dbl>, `IBGE_15-59` <dbl>, `IBGE_60+` <dbl>,
## # IBGE_PLANTED_AREA <dbl>, `IBGE_CROP_PRODUCTION_$` <dbl>, `IDHM Ranking
## # 2010` <dbl>, IDHM <dbl>, IDHM_Renda <dbl>, IDHM_Longevidade <dbl>,
## # IDHM_Educacao <dbl>, LONG <dbl>, LAT <dbl>, ALT <dbl>, PAY_TV <dbl>,
## # FIXED_PHONES <dbl>, AREA <dbl>, REGIAO_TUR <chr>, CATEGORIA_TUR <chr>,
## # ESTIMATED_POP <dbl>, RURAL_URBAN <chr>, GVA_AGROPEC <dbl>,
## # GVA_INDUSTRY <dbl>, GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, ` GVA_TOTAL
## # ` <dbl>, TAXES <dbl>, GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>,
## # GVA_MAIN <chr>, MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>,
## # COMP_B <dbl>, COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>,
## # COMP_G <dbl>, COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>,
## # COMP_L <dbl>, COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>,
## # COMP_Q <dbl>, COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>,
## # HOTELS <dbl>, BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>,
## # Pr_Bank <dbl>, Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>, Cars <dbl>,
## # Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <dbl>, MAC <dbl>,
## # `WAL-MART` <dbl>, POST_OFFICES <dbl>
The GDP_CAPITA is also NA , we will remove this city from our study.
santa <- invalid%>%
filter(CITY == "Santa Terezinha")
santa
## # A tibble: 1 x 81
## CITY STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~ IBGE_DU
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Sant~ BA 0 9648 9648 0 2891
## # ... with 74 more variables: IBGE_DU_URBAN <dbl>, IBGE_DU_RURAL <dbl>,
## # IBGE_POP <dbl>, IBGE_1 <dbl>, `IBGE_1-4` <dbl>, `IBGE_5-9` <dbl>,
## # `IBGE_10-14` <dbl>, `IBGE_15-59` <dbl>, `IBGE_60+` <dbl>,
## # IBGE_PLANTED_AREA <dbl>, `IBGE_CROP_PRODUCTION_$` <dbl>, `IDHM Ranking
## # 2010` <dbl>, IDHM <dbl>, IDHM_Renda <dbl>, IDHM_Longevidade <dbl>,
## # IDHM_Educacao <dbl>, LONG <dbl>, LAT <dbl>, ALT <dbl>, PAY_TV <dbl>,
## # FIXED_PHONES <dbl>, AREA <dbl>, REGIAO_TUR <chr>, CATEGORIA_TUR <chr>,
## # ESTIMATED_POP <dbl>, RURAL_URBAN <chr>, GVA_AGROPEC <dbl>,
## # GVA_INDUSTRY <dbl>, GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, ` GVA_TOTAL
## # ` <dbl>, TAXES <dbl>, GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>,
## # GVA_MAIN <chr>, MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>,
## # COMP_B <dbl>, COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>,
## # COMP_G <dbl>, COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>,
## # COMP_L <dbl>, COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>,
## # COMP_Q <dbl>, COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>,
## # HOTELS <dbl>, BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>,
## # Pr_Bank <dbl>, Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>, Cars <dbl>,
## # Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <dbl>, MAC <dbl>,
## # `WAL-MART` <dbl>, POST_OFFICES <dbl>
The GDP_CAPITA is also NA , we will remove this city from our study.
invalid_fixed <- invalid %>%
filter(CITY != "Santa Terezinha") %>% filter(CITY != "São Caetano")
From this the cities that do not have LONG LAT are : Balneário Rincão Lagoa Dos Patos Mojuí Dos Campos Paraíso Das Águas Pescaria Brava Pinhal Da Serra Pinto Bandeira Santa Terezinha São Caetano .
I will manually input the data with reference to google maps.
invalid_fixed$LONG[invalid_fixed$CITY == "Balneário Rincão"] <- -49.2361
invalid_fixed$LAT[invalid_fixed$CITY == "Balneário Rincão"] <- -28.8344
invalid_fixed$LONG[invalid_fixed$CITY == "Lagoa Dos Patos"] <- --51.4725
invalid_fixed$LAT[invalid_fixed$CITY == "Lagoa Dos Patos"] <- -31.0697
invalid_fixed$LONG[invalid_fixed$CITY == "Mojuí Dos Campos"] <- -54.6431
invalid_fixed$LAT[invalid_fixed$CITY == "Mojuí Dos Campos"] <- -2.68472
invalid_fixed$LONG[invalid_fixed$CITY == "Paraíso Das Águas"] <- -53.0102
invalid_fixed$LAT[invalid_fixed$CITY == "Paraíso Das Águas"] <- -19.0257
invalid_fixed$LONG[invalid_fixed$CITY == "Pescaria Brava"] <- -48.8956
invalid_fixed$LAT[invalid_fixed$CITY == "Pescaria Brava"] <- -28.4247
invalid_fixed$LONG[invalid_fixed$CITY == "Pinhal Da Serra"] <- -51.1733
invalid_fixed$LAT[invalid_fixed$CITY == "Pinhal Da Serra"] <- -27.8747
invalid_fixed$LONG[invalid_fixed$CITY == "Pinto Bandeira"] <- -51.4503
invalid_fixed$LAT[invalid_fixed$CITY == "Pinto Bandeira"] <- -29.0978
invalid_fixed$LONG[invalid_fixed$CITY == "Santa Terezinha"] <- -39.5184
invalid_fixed$LAT[invalid_fixed$CITY == "Santa Terezinha"] <- -12.7498
invalid_fixed$LONG[invalid_fixed$CITY == "São Caetano"] <- -36.1459
invalid_fixed$LAT[invalid_fixed$CITY == "São Caetano"] <- -8.33
invalid_fixed_sf <- st_as_sf(invalid_fixed,
coords = c("LONG", "LAT"),
crs=4674)
centroid <- st_centroid(unsucess)
## Warning in st_centroid.sf(unsucess): st_centroid assumes attributes are constant
## over geometries of x
## Warning in st_centroid.sfc(st_geometry(x), of_largest_polygon =
## of_largest_polygon): st_centroid does not give correct centroids for longitude/
## latitude data
getmap <- function(osm) {
tm_shape(osm)+
tm_layout(legend.outside = TRUE)+
tm_rgb()+
tm_shape(unsucess)+
tm_borders(alpha=0.5, col = "darkblue", lwd = 2, lty="longdash")+
tm_shape(centroid)+
tm_bubbles(col="yellow", size=2, alpha=0.5)+
tm_shape(invalid_fixed_sf)+
tm_bubbles(col="green", size=2, alpha=0.5)
}
The green circle represents the center of the municipilty from Google Maps while the yellow circle represents the center of the centriod.
Mojuí Dos Campos
mojui <- unsucess%>%
filter(code_muni == "1504752")
mojui<-getTiles(x = mojui, type = "osm")
getmap(mojui)
## Warning: Raster values found that are outside the range [0, 255]
Pinhal Da Serra
pinhal <- unsucess%>%
filter(code_muni == "4314464")
pinhal<-getTiles(x = pinhal, type = "osm")
getmap(pinhal)
## Warning: Raster values found that are outside the range [0, 255]
Pinto Bandeira
pinto <- unsucess%>%
filter(code_muni == "4314548")
pinto<-getTiles(x = pinto, type = "osm")
getmap(pinto)
## Warning: Raster values found that are outside the range [0, 255]
Pescaria Brava
pes <- unsucess%>%
filter(code_muni == "4212650")
pes<-getTiles(x = pes, type = "osm")
getmap(pes)
## Warning: Raster values found that are outside the range [0, 255]
Balneário Rincão
bal <- unsucess%>%
filter(code_muni == "4220000")
bal<-getTiles(x = bal, type = "osm")
getmap(bal)
## Warning: Raster values found that are outside the range [0, 255]
Paraíso Das Águas
par <- unsucess%>%
filter(code_muni == "5006275")
par<-getTiles(x = par, type = "osm")
getmap(par)
## Warning: Raster values found that are outside the range [0, 255]
From these plots , we can see that using online data shows the center of economic activity which is beneficial for analysis.
merged_fixed <- st_join(invalid_fixed_sf, mun,join=st_within)
## although coordinates are longitude/latitude, st_within assumes that they are planar
merged_fixed_sf <- rbind(brazil_mun, merged_fixed)
NA check
merged_fixed_sf[!complete.cases(merged_fixed_sf$GDP_CAPITA), ]
## Simple feature collection with 2 features and 84 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -39.52114 ymin: -31.0697 xmax: 51.4725 ymax: -12.77285
## geographic CRS: SIRGAS 2000
## # A tibble: 2 x 85
## CITY STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~ IBGE_DU
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Sant~ BA 0 NA NA NA NA
## 2 Lago~ RS 0 NA NA NA NA
## # ... with 78 more variables: IBGE_DU_URBAN <dbl>, IBGE_DU_RURAL <dbl>,
## # IBGE_POP <dbl>, IBGE_1 <dbl>, `IBGE_1-4` <dbl>, `IBGE_5-9` <dbl>,
## # `IBGE_10-14` <dbl>, `IBGE_15-59` <dbl>, `IBGE_60+` <dbl>,
## # IBGE_PLANTED_AREA <dbl>, `IBGE_CROP_PRODUCTION_$` <dbl>, `IDHM Ranking
## # 2010` <dbl>, IDHM <dbl>, IDHM_Renda <dbl>, IDHM_Longevidade <dbl>,
## # IDHM_Educacao <dbl>, ALT <dbl>, PAY_TV <dbl>, FIXED_PHONES <dbl>,
## # AREA <dbl>, REGIAO_TUR <chr>, CATEGORIA_TUR <chr>, ESTIMATED_POP <dbl>,
## # RURAL_URBAN <chr>, GVA_AGROPEC <dbl>, GVA_INDUSTRY <dbl>,
## # GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, ` GVA_TOTAL ` <dbl>, TAXES <dbl>,
## # GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>, GVA_MAIN <chr>,
## # MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>, COMP_B <dbl>,
## # COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>, COMP_G <dbl>,
## # COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>, COMP_L <dbl>,
## # COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>, COMP_Q <dbl>,
## # COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>, HOTELS <dbl>,
## # BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>, Pr_Bank <dbl>,
## # Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>, Cars <dbl>,
## # Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <dbl>, MAC <dbl>,
## # `WAL-MART` <dbl>, POST_OFFICES <dbl>, geometry <POINT [°]>,
## # code_muni <dbl>, name_muni <fct>, code_state <fct>, abbrev_state <fct>,
## # area [m^2]
Since Santa Terezinha in state BA is in the BRAZIL_CITIES.csv file but not in the municipality data downloaded through geobr and lagoa dos patos is a lagoon , we will be removing these 2 data.
merged_fixed_sf <- merged_fixed_sf %>% filter(!is.na(GDP_CAPITA))
Next , we will join the data into our municipility data
merged_fixed <- as.data.frame(merged_fixed_sf)
muni_merged <- left_join(mun, merged_fixed, "code_muni") %>% filter(!is.na(CITY))
NA check
muni_merged[!complete.cases(muni_merged$GDP_CAPITA), ]
## Simple feature collection with 0 features and 88 fields
## Active geometry column: geom
## bbox: xmin: NA ymin: NA xmax: NA ymax: NA
## geographic CRS: SIRGAS 2000
## [1] code_muni name_muni.x code_state.x
## [4] abbrev_state.x area.x CITY
## [7] STATE CAPITAL IBGE_RES_POP
## [10] IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU
## [13] IBGE_DU_URBAN IBGE_DU_RURAL IBGE_POP
## [16] IBGE_1 IBGE_1-4 IBGE_5-9
## [19] IBGE_10-14 IBGE_15-59 IBGE_60+
## [22] IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION_$ IDHM Ranking 2010
## [25] IDHM IDHM_Renda IDHM_Longevidade
## [28] IDHM_Educacao ALT PAY_TV
## [31] FIXED_PHONES AREA REGIAO_TUR
## [34] CATEGORIA_TUR ESTIMATED_POP RURAL_URBAN
## [37] GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES
## [40] GVA_PUBLIC GVA_TOTAL TAXES
## [43] GDP POP_GDP GDP_CAPITA
## [46] GVA_MAIN MUN_EXPENDIT COMP_TOT
## [49] COMP_A COMP_B COMP_C
## [52] COMP_D COMP_E COMP_F
## [55] COMP_G COMP_H COMP_I
## [58] COMP_J COMP_K COMP_L
## [61] COMP_M COMP_N COMP_O
## [64] COMP_P COMP_Q COMP_R
## [67] COMP_S COMP_T COMP_U
## [70] HOTELS BEDS Pr_Agencies
## [73] Pu_Agencies Pr_Bank Pu_Bank
## [76] Pr_Assets Pu_Assets Cars
## [79] Motorcycles Wheeled_tractor UBER
## [82] MAC WAL-MART POST_OFFICES
## [85] name_muni.y code_state.y abbrev_state.y
## [88] area.y geom geometry
## <0 rows> (or 0-length row.names)
tm_shape(muni_merged)+
tm_fill("GDP_CAPITA",
n = 5,
style = "fisher") +
tm_layout(scale=0.6)
From the plot , we can see that the highest GDP per capita are around the satelight cities around Sao Paulo rather than the main city itself. We can also see concentrations of higher GDP per capita in the inland areas like Campos De Júlio. This could be due to a lower population while the region is still generating a large amount of production.
ggplot(data=muni_merged, aes(x=`GDP_CAPITA`)) +
geom_histogram(bins=20, color="black", fill="light blue")
The plot reveals that GDP_CAPITA is right skewed distribution. Therefore, there are many brazil municipality earns much lower GDP_CAPITA which indicates the vast difference from the richest to the poorest municipality.
Human Development Index
dictionary %>% filter(FIELD %in% c("IDHM Ranking 2010", "IDHM", "IDHM_Renda", "IDHM_Longevidade", "IDHM_Educacao"))
## # A tibble: 4 x 6
## FIELD DESCRIPTION REFERENCE UNIT SOURCE X6
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 IDHM HDI Human Devel~ 2010 - http://www.br.undp.org/conte~ <NA>
## 2 IDHM_Ren~ HDI GNI Index 2010 - http://www.br.undp.org/conte~ <NA>
## 3 IDHM_Lon~ HDI Life Expect~ 2010 - http://www.br.undp.org/conte~ <NA>
## 4 IDHM_Edu~ HDI Education i~ 2010 - http://www.br.undp.org/conte~ <NA>
Replacing the NA values with 0 will make those rows increase in “value”. Instead , we will use the last position for IDHM Ranking 2010 and the minimum value for the rest of the index variables.
merged_fixed_sf$"IDHM Ranking 2010" <- merged_fixed_sf$"IDHM Ranking 2010" %>% replace_na(5569)
merged_fixed_sf$"IDHM" <- merged_fixed_sf$"IDHM" %>% replace_na(0.4180)
merged_fixed_sf$"IDHM_Renda" <- merged_fixed_sf$"IDHM_Renda" %>% replace_na(0.4000)
merged_fixed_sf$"IDHM_Longevidade" <- merged_fixed_sf$"IDHM_Longevidade" %>% replace_na(0.6720)
merged_fixed_sf$"IDHM_Educacao" <- merged_fixed_sf$"IDHM_Longevidade" %>% replace_na(0.2070)
dictionary %>% filter(FIELD %in% c("AREA"))
## # A tibble: 1 x 6
## FIELD DESCRIPTION REFERENCE UNIT SOURCE X6
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 AREA City area (squa~ 2018 1 squared Kilome~ https://www.ibge.gov~ <NA>
From the output, there is 1 row with NA for area. We will use the area computed with st_area instead
Lastly , we will replace will numerical variables to 0 and all categorical variables to .
merged_fixed_sf <- merged_fixed_sf %>%
mutate_if(is.numeric , replace_na, replace = 0)
merged_fixed_sf <- merged_fixed_sf %>% replace(is.na(.), "NA")
any(is.na(merged_fixed_sf))
## [1] FALSE
merged_derived <- merged_fixed_sf %>%
mutate(area = area / 100000) %>%
mutate(`Tax_To_GDP` = `TAXES`/`GDP`) %>%
mutate(`Active`=`IBGE_15-59`/`IBGE_POP`)%>%
mutate(`Cars_p` = `Cars` /`ESTIMATED_POP` ) %>%
mutate(`Motorcycles_p` = `Motorcycles` /`ESTIMATED_POP`) %>%
mutate(`Wheeled_tractor_p` = `Wheeled_tractor` /`ESTIMATED_POP`)%>%
mutate(`GVA_AGROPEC_p` = `GVA_AGROPEC`/`POP_GDP`)%>%
mutate(`GVA_SERVICES_p` = `GVA_SERVICES`/`POP_GDP`)%>%
mutate(`GVA_PUBLIC_p` = `GVA_PUBLIC`/`POP_GDP`)%>%
mutate(`GVA_INDUSTRY_p` = `GVA_INDUSTRY`/`POP_GDP`) %>%
mutate(`Pop_density` = `IBGE_POP`/`area`)%>%
mutate(Brazilian_Foreigners_Ratio = `IBGE_RES_POP_BRAS`/`IBGE_RES_POP_ESTR`) %>%
mutate(DU_density = IBGE_DU/IBGE_POP) %>%
mutate(DU_Urban_Rural_Ratio = IBGE_DU_URBAN/IBGE_DU_RURAL) %>%
mutate(planted_ratio = IBGE_PLANTED_AREA/area) %>%
mutate(planted_value_hectare = `IBGE_CROP_PRODUCTION_$`/IBGE_PLANTED_AREA) %>%
mutate(PAY_TV = PAY_TV / POP_GDP) %>%
mutate(FIXED_PHONES = FIXED_PHONES / POP_GDP) %>%
mutate(expenditures_capita = MUN_EXPENDIT/POP_GDP) %>%
mutate(COMP_A = COMP_A/POP_GDP) %>%
mutate(COMP_B = COMP_B/POP_GDP) %>%
mutate(COMP_C = COMP_C/POP_GDP) %>%
mutate(COMP_D = COMP_D/POP_GDP) %>%
mutate(COMP_E = COMP_E/POP_GDP) %>%
mutate(COMP_F = COMP_F/POP_GDP) %>%
mutate(COMP_G = COMP_G/POP_GDP) %>%
mutate(COMP_H = COMP_H/POP_GDP) %>%
mutate(COMP_I = COMP_I/POP_GDP) %>%
mutate(COMP_J = COMP_J/POP_GDP) %>%
mutate(COMP_K = COMP_K/POP_GDP) %>%
mutate(COMP_L = COMP_L/POP_GDP) %>%
mutate(COMP_M = COMP_M/POP_GDP) %>%
mutate(COMP_N = COMP_N/POP_GDP) %>%
mutate(COMP_O = COMP_O/POP_GDP) %>%
mutate(COMP_P = COMP_P/POP_GDP) %>%
mutate(COMP_Q = COMP_Q/POP_GDP) %>%
mutate(COMP_R = COMP_R/POP_GDP) %>%
mutate(COMP_S = COMP_S/POP_GDP) %>%
mutate(COMP_T = COMP_T/POP_GDP) %>%
mutate(COMP_U = COMP_U/POP_GDP) %>%
mutate(BEDS = BEDS/HOTELS) %>%
mutate(HOTELS = HOTELS * 100000/ESTIMATED_POP) %>%
mutate(Pr_Assets = Pr_Assets/Pr_Bank) %>%
mutate(Pu_Assets = Pu_Assets/Pu_Bank) %>%
mutate(Pr_Bank = Pr_Bank* 100000/ESTIMATED_POP) %>%
mutate(Pu_Bank = Pu_Bank* 100000/ESTIMATED_POP) %>%
mutate(PAY_TV = PAY_TV / ESTIMATED_POP) %>%
mutate(FIXED_PHONES = FIXED_PHONES / ESTIMATED_POP) %>%
mutate(MAC = MAC * 100000/ESTIMATED_POP) %>%
mutate(POST_OFFICES = POST_OFFICES* 100000/ESTIMATED_POP) %>%
mutate(walmart = `WAL-MART`* 100000/ESTIMATED_POP)
The population data used are different as we will use the ones closest to the date of the source. POP_GDP:2016, ESTIMATED_POP:2018 , IBGE_POP:2010
brazil_var <- merged_derived%>% select(code_muni, CITY, STATE, GDP_CAPITA, CAPITAL, Pop_density, Brazilian_Foreigners_Ratio, Active, DU_density, DU_Urban_Rural_Ratio,planted_ratio, planted_value_hectare, IDHM,IDHM_Renda, IDHM_Longevidade, IDHM_Educacao, PAY_TV, FIXED_PHONES, expenditures_capita, COMP_A, COMP_B, COMP_C, COMP_D, COMP_E, COMP_F, COMP_G, COMP_H, COMP_I, COMP_J, COMP_K, COMP_L, COMP_M, COMP_N, COMP_O, COMP_P, COMP_Q, COMP_R, COMP_S, COMP_T, COMP_U, BEDS, HOTELS, Pr_Assets, Pu_Assets, Pr_Bank,Pu_Bank, Cars_p,Motorcycles_p, Wheeled_tractor_p, MAC, walmart, REGIAO_TUR, CATEGORIA_TUR, RURAL_URBAN, GVA_AGROPEC_p,GVA_INDUSTRY_p,GVA_SERVICES_p,GVA_PUBLIC_p)
We first need to check the spread of the variables
length(unique(brazil_var$CATEGORIA_TUR))
## [1] 6
length(unique(brazil_var$RURAL_URBAN))
## [1] 6
length(unique(brazil_var$REGIAO_TUR))
## [1] 323
eliminate REGIA_TUR as it has too many unique types. Now , we will look at the spread of the remaining 2 variables
table(brazil_var$RURAL_URBAN)
##
## Intermediário Adjacente Intermediário Remoto Rural Adjacente
## 686 60 3039
## Rural Remoto Sem classificação Urbano
## 323 5 1456
table(brazil_var$CATEGORIA_TUR)
##
## A B C D E NA
## 51 168 521 1891 653 2285
we will use RURAL_URBAN as CATEGORIA_TUR has many NA values which is not ideal
Besides REGIAO_TUR and CATEGORIA_TUR, , the following rows will be removed as most of their values are 0 which will not be significant in our analysis.
brazil_var <- brazil_var%>%
select(!REGIAO_TUR) %>%
select(!CATEGORIA_TUR) %>%
select(!COMP_T) %>%
select(!COMP_U) %>%
select(!MAC) %>%
select(!walmart) %>%
mutate(value = 1) %>%
spread(RURAL_URBAN, value, fill=0)
Removing NA
muni_merged.sf <- brazil_var %>%
replace(is.na(.), 0) %>%
mutate_if(is.numeric, function(x) ifelse(is.infinite(x), 0, x))
muni_merged_new <- st_set_geometry(brazil_var, NULL)
muni_merged_new <- brazil_var %>% replace(is.na(.), 0) %>%
mutate_if(is.numeric, function(x) ifelse(is.infinite(x), 0, x))
muni_merged_new<- as.data.frame(muni_merged_new)
We will be plotting some of the factors to see the general skew of the variables.
GVA ratios
as.data.frame(brazil_var)[48:51] %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Company Type ratios
as.data.frame(brazil_var)[20:38] %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Vehicles
as.data.frame(brazil_var)[45:47] %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Overall, the variables are right skewed
To get a better analysis to identify the relationships between the various factors , I will know conduct a correlation analysis to better understand the relationships between the respective factors.The code chunk below is used to plot a scatterplot matrix of the relationship between the independent variables in muni_merged data.frame. To avoid multticollinearity, we will plot the scatterplot to better understand the bivariate relationship.
corrplot(cor(as.data.frame(muni_merged_new[,4:51])), diag = FALSE, order = "alphabet",
tl.pos = "td", tl.cex = 0.5, method = "number", type = "upper",number.cex = 0.2)
We will be setting a threshold of >0.75 as it may lead to issues when running multi-regression analysis. Further discussion can be found here https://www.researchgate.net/post/How_can_I_avoid_multicollinearity
Hence , the following variables will be removed due to their high correlation
muni_merged_new <- muni_merged_new %>%
select(!IDHM) %>%
select(!IDHM_Educacao) %>%
select(!IDHM_Renda) %>%
select(!IDHM_Longevidade)%>%
select(!COMP_F)%>%
select(!COMP_G)%>%
select(!COMP_K)%>%
select(!GVA_SERVICES_p)%>%
select(!GVA_AGROPEC_p)%>%
select(!GVA_INDUSTRY_p)%>%
select(!Cars_p)%>%
select(!COMP_M)%>%
select(!COMP_H)%>%
select(!COMP_I)%>%
select(!COMP_L)%>%
select(!COMP_P)%>%
select(!COMP_R)%>%
select(!COMP_O)%>%
select(!PAY_TV)
corrplot(cor(as.data.frame(muni_merged_new[,4:32])), diag = FALSE, order = "alphabet",
tl.pos = "td", tl.cex = 0.5, method = "number", type = "upper",number.cex = 0.2)
Now, all our values fall below the 0.75 correlation coefficient.
We will be using the following hypothesis.
H0 : The GDP per capita are randomly distributed H1 : The GDP per capita are not randomly distributed
Our confidence level will be set at 95% and our alpha value is 0.05.
muni_merged_new <- muni_merged_new %>%
rename(RURAL_URBAN_1 = `Intermediário Adjacente`) %>%
rename(RURAL_URBAN_2 = `Intermediário Remoto`) %>%
rename(RURAL_URBAN_3 = `Rural Adjacente`) %>%
rename(RURAL_URBAN_4 = `Rural Remoto`) %>%
rename(RURAL_URBAN_5 = `Sem classificação`)
muni.mlr <- lm(formula = GDP_CAPITA ~ CAPITAL + Pop_density + Brazilian_Foreigners_Ratio + Active + DU_density + DU_Urban_Rural_Ratio + planted_ratio + planted_value_hectare + FIXED_PHONES + expenditures_capita + COMP_A + COMP_B + COMP_C + COMP_D + COMP_E + COMP_J + COMP_N + COMP_Q + COMP_S + BEDS + HOTELS + Pr_Assets + Pu_Assets + Pr_Bank + Pu_Bank + Motorcycles_p + Wheeled_tractor_p + RURAL_URBAN_1 + RURAL_URBAN_2 + RURAL_URBAN_3 + RURAL_URBAN_4 + RURAL_URBAN_5 + Urbano +GVA_PUBLIC_p, data=muni_merged_new)
summary(muni.mlr)
##
## Call:
## lm(formula = GDP_CAPITA ~ CAPITAL + Pop_density + Brazilian_Foreigners_Ratio +
## Active + DU_density + DU_Urban_Rural_Ratio + planted_ratio +
## planted_value_hectare + FIXED_PHONES + expenditures_capita +
## COMP_A + COMP_B + COMP_C + COMP_D + COMP_E + COMP_J + COMP_N +
## COMP_Q + COMP_S + BEDS + HOTELS + Pr_Assets + Pu_Assets +
## Pr_Bank + Pu_Bank + Motorcycles_p + Wheeled_tractor_p + RURAL_URBAN_1 +
## RURAL_URBAN_2 + RURAL_URBAN_3 + RURAL_URBAN_4 + RURAL_URBAN_5 +
## Urbano + GVA_PUBLIC_p, data = muni_merged_new)
##
## Residuals:
## Min 1Q Median 3Q Max
## -40531 -6984 -2150 3406 272088
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.699e+04 5.934e+03 -16.345 < 2e-16 ***
## CAPITAL -6.843e+03 3.497e+03 -1.957 0.050415 .
## Pop_density 3.786e+00 4.653e+00 0.814 0.415896
## Brazilian_Foreigners_Ratio 7.384e-02 1.517e-01 0.487 0.626568
## Active 1.674e+05 9.650e+03 17.343 < 2e-16 ***
## DU_density -1.722e+03 7.965e+02 -2.162 0.030653 *
## DU_Urban_Rural_Ratio 5.342e-01 1.690e+00 0.316 0.751922
## planted_ratio 1.234e+03 9.024e+01 13.676 < 2e-16 ***
## planted_value_hectare 8.645e+01 3.674e+01 2.353 0.018643 *
## FIXED_PHONES -2.136e+07 2.323e+07 -0.919 0.357909
## expenditures_capita 2.648e+00 1.498e-01 17.674 < 2e-16 ***
## COMP_A 1.495e+05 3.355e+04 4.457 8.47e-06 ***
## COMP_B -7.400e+05 6.404e+05 -1.155 0.247974
## COMP_C -1.463e+05 1.162e+05 -1.259 0.208097
## COMP_D 1.657e+07 2.175e+06 7.619 2.98e-14 ***
## COMP_E 8.334e+06 2.201e+06 3.787 0.000154 ***
## COMP_J 1.345e+05 4.416e+05 0.305 0.760693
## COMP_N 1.268e+06 2.682e+05 4.728 2.32e-06 ***
## COMP_Q 2.005e+06 5.383e+05 3.725 0.000197 ***
## COMP_S -5.134e+05 1.662e+05 -3.089 0.002022 **
## BEDS 1.536e+01 6.764e+00 2.270 0.023226 *
## HOTELS 1.108e+01 1.746e+01 0.635 0.525548
## Pr_Assets 1.010e-07 2.374e-08 4.255 2.13e-05 ***
## Pu_Assets 2.274e-08 8.058e-09 2.822 0.004782 **
## Pr_Bank 1.215e+02 4.394e+01 2.766 0.005702 **
## Pu_Bank 1.958e+02 3.127e+01 6.261 4.10e-10 ***
## Motorcycles_p -2.060e+04 3.473e+03 -5.930 3.21e-09 ***
## Wheeled_tractor_p 1.386e+06 5.149e+05 2.692 0.007122 **
## RURAL_URBAN_1 -2.630e+03 8.064e+02 -3.262 0.001113 **
## RURAL_URBAN_2 5.729e+03 2.169e+03 2.642 0.008265 **
## RURAL_URBAN_3 -1.611e+03 6.842e+02 -2.354 0.018608 *
## RURAL_URBAN_4 2.410e+03 1.117e+03 2.158 0.030972 *
## RURAL_URBAN_5 1.067e+05 9.496e+03 11.238 < 2e-16 ***
## Urbano NA NA NA NA
## GVA_PUBLIC_p 1.465e+03 1.161e+02 12.618 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16150 on 5535 degrees of freedom
## Multiple R-squared: 0.3729, Adjusted R-squared: 0.3691
## F-statistic: 99.72 on 33 and 5535 DF, p-value: < 2.2e-16
From the summary , we will be removing variables which are statistically insignificant.
muni.mlr <- lm(formula = GDP_CAPITA ~ Active + planted_ratio + DU_density +planted_value_hectare + expenditures_capita + COMP_A + COMP_D + COMP_E + COMP_N + COMP_Q + COMP_S+BEDS+ Pu_Assets + Pr_Assets +Pu_Assets + Pu_Bank + Pr_Bank + Motorcycles_p + Wheeled_tractor_p +RURAL_URBAN_1+RURAL_URBAN_3+RURAL_URBAN_4 +RURAL_URBAN_2+RURAL_URBAN_5+GVA_PUBLIC_p , data=muni_merged_new)
summary(muni.mlr)
##
## Call:
## lm(formula = GDP_CAPITA ~ Active + planted_ratio + DU_density +
## planted_value_hectare + expenditures_capita + COMP_A + COMP_D +
## COMP_E + COMP_N + COMP_Q + COMP_S + BEDS + Pu_Assets + Pr_Assets +
## Pu_Assets + Pu_Bank + Pr_Bank + Motorcycles_p + Wheeled_tractor_p +
## RURAL_URBAN_1 + RURAL_URBAN_3 + RURAL_URBAN_4 + RURAL_URBAN_2 +
## RURAL_URBAN_5 + GVA_PUBLIC_p, data = muni_merged_new)
##
## Residuals:
## Min 1Q Median 3Q Max
## -39927 -6996 -2195 3343 272207
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.356e+04 5.595e+03 -16.721 < 2e-16 ***
## Active 1.622e+05 9.069e+03 17.886 < 2e-16 ***
## planted_ratio 1.236e+03 8.927e+01 13.847 < 2e-16 ***
## DU_density -1.666e+03 7.934e+02 -2.100 0.035773 *
## planted_value_hectare 8.305e+01 3.615e+01 2.297 0.021638 *
## expenditures_capita 2.598e+00 1.454e-01 17.869 < 2e-16 ***
## COMP_A 1.458e+05 3.296e+04 4.422 9.94e-06 ***
## COMP_D 1.628e+07 2.170e+06 7.499 7.43e-14 ***
## COMP_E 7.816e+06 2.164e+06 3.612 0.000307 ***
## COMP_N 1.257e+06 2.520e+05 4.986 6.37e-07 ***
## COMP_Q 1.921e+06 5.261e+05 3.651 0.000264 ***
## COMP_S -5.653e+05 1.638e+05 -3.452 0.000561 ***
## BEDS 1.556e+01 6.586e+00 2.363 0.018148 *
## Pu_Assets 1.855e-08 7.710e-09 2.407 0.016136 *
## Pr_Assets 1.049e-07 2.272e-08 4.618 3.95e-06 ***
## Pu_Bank 1.904e+02 3.090e+01 6.160 7.80e-10 ***
## Pr_Bank 1.158e+02 4.248e+01 2.725 0.006454 **
## Motorcycles_p -2.057e+04 3.434e+03 -5.989 2.24e-09 ***
## Wheeled_tractor_p 1.210e+06 5.042e+05 2.400 0.016409 *
## RURAL_URBAN_1 -2.817e+03 7.933e+02 -3.552 0.000386 ***
## RURAL_URBAN_3 -1.939e+03 6.544e+02 -2.963 0.003057 **
## RURAL_URBAN_4 2.171e+03 1.098e+03 1.977 0.048128 *
## RURAL_URBAN_2 5.597e+03 2.165e+03 2.585 0.009751 **
## RURAL_URBAN_5 1.033e+05 9.287e+03 11.123 < 2e-16 ***
## GVA_PUBLIC_p 1.453e+03 1.150e+02 12.634 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16150 on 5544 degrees of freedom
## Multiple R-squared: 0.3718, Adjusted R-squared: 0.3691
## F-statistic: 136.7 on 24 and 5544 DF, p-value: < 2.2e-16
Since p value is less than alpha value of 0.05, we reject the null hypothesis.The final model has an Adjusted R-Squared of 0.3691 . We will investigate the reason the low Adjusted R-Squared.
ols_vif_tol(muni.mlr)
## Variables Tolerance VIF
## 1 Active 0.4110149 2.433002
## 2 planted_ratio 0.7873648 1.270059
## 3 DU_density 0.7533492 1.327406
## 4 planted_value_hectare 0.9173524 1.090094
## 5 expenditures_capita 0.7977351 1.253549
## 6 COMP_A 0.9450595 1.058134
## 7 COMP_D 0.9885257 1.011608
## 8 COMP_E 0.8789610 1.137707
## 9 COMP_N 0.6025756 1.659543
## 10 COMP_Q 0.5672256 1.762967
## 11 COMP_S 0.7216954 1.385626
## 12 BEDS 0.8018460 1.247122
## 13 Pu_Assets 0.9695880 1.031366
## 14 Pr_Assets 0.9839621 1.016299
## 15 Pu_Bank 0.8327285 1.200872
## 16 Pr_Bank 0.8571219 1.166695
## 17 Motorcycles_p 0.8795494 1.136946
## 18 Wheeled_tractor_p 0.8255682 1.211287
## 19 RURAL_URBAN_1 0.6890810 1.451208
## 20 RURAL_URBAN_3 0.4411678 2.266711
## 21 RURAL_URBAN_4 0.7106230 1.407216
## 22 RURAL_URBAN_2 0.9378754 1.066240
## 23 RURAL_URBAN_5 0.6054314 1.651715
## 24 GVA_PUBLIC_p 0.9243386 1.081855
summary(muni.mlr)
##
## Call:
## lm(formula = GDP_CAPITA ~ Active + planted_ratio + DU_density +
## planted_value_hectare + expenditures_capita + COMP_A + COMP_D +
## COMP_E + COMP_N + COMP_Q + COMP_S + BEDS + Pu_Assets + Pr_Assets +
## Pu_Assets + Pu_Bank + Pr_Bank + Motorcycles_p + Wheeled_tractor_p +
## RURAL_URBAN_1 + RURAL_URBAN_3 + RURAL_URBAN_4 + RURAL_URBAN_2 +
## RURAL_URBAN_5 + GVA_PUBLIC_p, data = muni_merged_new)
##
## Residuals:
## Min 1Q Median 3Q Max
## -39927 -6996 -2195 3343 272207
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.356e+04 5.595e+03 -16.721 < 2e-16 ***
## Active 1.622e+05 9.069e+03 17.886 < 2e-16 ***
## planted_ratio 1.236e+03 8.927e+01 13.847 < 2e-16 ***
## DU_density -1.666e+03 7.934e+02 -2.100 0.035773 *
## planted_value_hectare 8.305e+01 3.615e+01 2.297 0.021638 *
## expenditures_capita 2.598e+00 1.454e-01 17.869 < 2e-16 ***
## COMP_A 1.458e+05 3.296e+04 4.422 9.94e-06 ***
## COMP_D 1.628e+07 2.170e+06 7.499 7.43e-14 ***
## COMP_E 7.816e+06 2.164e+06 3.612 0.000307 ***
## COMP_N 1.257e+06 2.520e+05 4.986 6.37e-07 ***
## COMP_Q 1.921e+06 5.261e+05 3.651 0.000264 ***
## COMP_S -5.653e+05 1.638e+05 -3.452 0.000561 ***
## BEDS 1.556e+01 6.586e+00 2.363 0.018148 *
## Pu_Assets 1.855e-08 7.710e-09 2.407 0.016136 *
## Pr_Assets 1.049e-07 2.272e-08 4.618 3.95e-06 ***
## Pu_Bank 1.904e+02 3.090e+01 6.160 7.80e-10 ***
## Pr_Bank 1.158e+02 4.248e+01 2.725 0.006454 **
## Motorcycles_p -2.057e+04 3.434e+03 -5.989 2.24e-09 ***
## Wheeled_tractor_p 1.210e+06 5.042e+05 2.400 0.016409 *
## RURAL_URBAN_1 -2.817e+03 7.933e+02 -3.552 0.000386 ***
## RURAL_URBAN_3 -1.939e+03 6.544e+02 -2.963 0.003057 **
## RURAL_URBAN_4 2.171e+03 1.098e+03 1.977 0.048128 *
## RURAL_URBAN_2 5.597e+03 2.165e+03 2.585 0.009751 **
## RURAL_URBAN_5 1.033e+05 9.287e+03 11.123 < 2e-16 ***
## GVA_PUBLIC_p 1.453e+03 1.150e+02 12.634 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16150 on 5544 degrees of freedom
## Multiple R-squared: 0.3718, Adjusted R-squared: 0.3691
## F-statistic: 136.7 on 24 and 5544 DF, p-value: < 2.2e-16
Test for non-linearity using ols_plot_resid_fit() by olsrr package
ols_plot_resid_fit(muni.mlr)
The figure above reveals that most of the data poitns are scattered around the 0 line, hence we can safely conclude that the relationships between the dependent variable and independent variables are linear.Additionally, there does not seem to be any obvious signs of heteroscadicity in the plot above.
Test for normality using ols_plot_resid_hist() by oslrr package
ols_plot_resid_hist(muni.mlr)
The figure above reveals that the residuals of muni_cities.mlr resemble normal distribution. We would normally use ols_test_normality() to further test this assumption. But the function is limtied to sample sizes between 3 to 5000 and we have 5569 observations. We will skip this step as we have sufficient evidence from the plot that it passes normality test.
H0 : The residuals of the multiple linear regression model are randomly distributed. H1 : The residuals of the multiple linear regression model are not randomly distributed.
We will use a confidence level of 95% with an alpha value of 0.05
First, export the residual of the hedonic model as a dataframe
mlr.output <- as.data.frame(muni.mlr$residuals)
Join the dataframe with the simple feature object - into a Simple feature
muni_merged<- cbind(muni_merged,
muni.mlr$residuals) %>%
rename(`MLR_RES` = `muni.mlr.residuals`)
muni_merged.res.sf <- cbind(muni_merged.sf,
muni.mlr$residuals) %>%
rename(`MLR_RES` = `muni.mlr.residuals`) %>%
rename(RURAL_URBAN_1 = `Intermediário.Adjacente`) %>%
rename(RURAL_URBAN_2 = `Intermediário.Remoto`) %>%
rename(RURAL_URBAN_3 = `Rural.Adjacente`) %>%
rename(RURAL_URBAN_4 = `Rural.Remoto`) %>%
rename(RURAL_URBAN_5 = `Sem.classificação`)
Convert the simple feature into a SpatialPointsDataFrame
mc_res.sp <- as_Spatial(muni_merged.res.sf)
mc_res.sp
## class : SpatialPointsDataFrame
## features : 5569
## extent : -72.9165, -32.43519, -33.68757, 4.58544 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
## variables : 58
## names : code_muni, CITY, STATE, GDP_CAPITA, CAPITAL, Pop_density, Brazilian_Foreigners_Ratio, Active, DU_density, DU_Urban_Rural_Ratio, planted_ratio, planted_value_hectare, IDHM, IDHM_Renda, IDHM_Longevidade, ...
## min values : 1100015, Abadia De Goiás, AC, 3190.57, 0, 0, 0, 0, 0, 0, 0, 0, 0.418, 0.4, 0.672, ...
## max values : 5300108, Zortéa, TO, 314637.69, 1, 1263.80737731171, 51708.6666666667, 0.744760130414532, 5.73393574297189, 7727.63636363636, 18.5998217696347, 106.960227272727, 0.862, 0.891, 0.894, ...
Visualize the distribution of residuals
tmap_mode("plot")
## tmap mode set to plotting
tm_shape(muni_merged)+
tm_fill("MLR_RES",
n = 6,
style = "quantile",
palette = "RdYlBu" ) +
tm_borders(alpha = 0.5)
## Variable(s) "MLR_RES" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
There seem to be spatial autocorrelation above. There isn’t a clear sign on whether it is clustered in any way or if theres a geospatial pattern in distribution.Further test to be done.
Build distance-based weight matrix
Find the distance summary
coords <- coordinates(mc_res.sp)
k <- knn2nb(knearneigh(coords))
k_dist <- unlist(nbdists(k, coords, longlat = TRUE))
summary(k_dist)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.6029 9.1050 13.1260 17.0100 19.7337 363.0083
Compute Distance-based matrix
nb <- knn2nb(knearneigh(coordinates(mc_res.sp), k=363, longlat = FALSE))
Convert into spatial weights
nb_lw <- nb2listw(nb, style = 'W')
lm.morantest(muni.mlr, nb_lw)
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = GDP_CAPITA ~ Active + planted_ratio + DU_density +
## planted_value_hectare + expenditures_capita + COMP_A + COMP_D + COMP_E
## + COMP_N + COMP_Q + COMP_S + BEDS + Pu_Assets + Pr_Assets + Pu_Assets +
## Pu_Bank + Pr_Bank + Motorcycles_p + Wheeled_tractor_p + RURAL_URBAN_1 +
## RURAL_URBAN_3 + RURAL_URBAN_4 + RURAL_URBAN_2 + RURAL_URBAN_5 +
## GVA_PUBLIC_p, data = muni_merged_new)
## weights: nb_lw
##
## Moran I statistic standard deviate = 21.121, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 1.625916e-02 -5.721239e-04 6.350277e-07
The Global Moran’s I test for residual spatial autocorrelation shows that it’s p-value is < 2.2e-16 which is less than the alpha value of 0.05. Hence, we will reject the null hypothesis that the residuals of the multiple linear regression model are randomly distributed.Since the Observed Global Moran I = 1.625916e-02 which is greater than 0, we can infer than the residuals resemble cluster distribution.
There are two possible approaches can be used to determine the stopping rule: + Cross-validation (CV) score + Maximises the model’s predictive power + The optimum bandwidth is derived from estimated with all its observations + AIC corrected (AICc) + Favours a compromise between the predictive power of the model and its complexity + Generally favours larger bandwidths than the CV score + Lower AICc value provides a better fit to the observed data
We will be using the adaptive bandwidth approach as each municipal have different sizes, and are spread at differing distances. Using fixed bandwidth will result in some municipal having too many neighbours to conduct any useful analysis.
coords <- coordinates(mc_res.sp)
dMat <- gw.dist(coords, focus=0, p=2, theta=0, longlat=F)
Adaptive argument is set to FALSE indicates that we are interested to compute the fixed bandwidth. CV score approach was defined as the stopping rule. Gaussian kernel Kernel is selected as gaussian to determine the optimal fixed bandwidth to use in explanatory model.
bw.adaptive <- bw.gwr(formula = GDP_CAPITA ~ Active + planted_ratio + DU_density +planted_value_hectare + expenditures_capita + COMP_A + COMP_D + COMP_E + COMP_N + COMP_Q + COMP_S+BEDS+ Pu_Assets + Pr_Assets +Pu_Assets + Pu_Bank + Pr_Bank + Motorcycles_p + Wheeled_tractor_p +RURAL_URBAN_1+RURAL_URBAN_3+RURAL_URBAN_4 +RURAL_URBAN_2+RURAL_URBAN_5+GVA_PUBLIC_p , data=mc_res.sp, approach="CV", kernel = 'gaussian', adaptive=TRUE, longlat = FALSE, dMat = dMat)
## Take a cup of tea and have a break, it will take a few minutes.
## -----A kind suggestion from GWmodel development group
## Adaptive bandwidth: 3449 CV score: 1.470825e+12
## Adaptive bandwidth: 2140 CV score: 1.459161e+12
## Adaptive bandwidth: 1329 CV score: 1.445566e+12
## Adaptive bandwidth: 830 CV score: 1.446466e+12
## Adaptive bandwidth: 1640 CV score: 1.450185e+12
## Adaptive bandwidth: 1139 CV score: 1.445868e+12
## Adaptive bandwidth: 1448 CV score: 1.447149e+12
## Adaptive bandwidth: 1256 CV score: 1.445101e+12
## Adaptive bandwidth: 1210 CV score: 1.444943e+12
## Adaptive bandwidth: 1182 CV score: 1.445261e+12
## Adaptive bandwidth: 1227 CV score: 1.445041e+12
## Adaptive bandwidth: 1199 CV score: 1.44508e+12
## Adaptive bandwidth: 1216 CV score: 1.445033e+12
## Adaptive bandwidth: 1205 CV score: 1.444918e+12
## Adaptive bandwidth: 1203 CV score: 1.444927e+12
## Adaptive bandwidth: 1207 CV score: 1.444911e+12
## Adaptive bandwidth: 1208 CV score: 1.444914e+12
## Adaptive bandwidth: 1206 CV score: 1.444902e+12
## Adaptive bandwidth: 1206 CV score: 1.444902e+12
The results show that 1206 is the recommended data points
We will be using the following hypothesis:
H0:The GDP per capita are randomly distributed H1: he GDP per capita are not randomly distributed
The confidence level is 95% and the alpha value is 0.05
gwr.adaptive <- gwr.basic(formula = GDP_CAPITA ~ Active + planted_ratio + DU_density +planted_value_hectare + expenditures_capita + COMP_A + COMP_D + COMP_E + COMP_N + COMP_Q + COMP_S+BEDS+ Pu_Assets + Pr_Assets +Pu_Assets + Pu_Bank + Pr_Bank + Motorcycles_p + Wheeled_tractor_p +RURAL_URBAN_1+RURAL_URBAN_3+RURAL_URBAN_4 +RURAL_URBAN_2+RURAL_URBAN_5+GVA_PUBLIC_p , data=mc_res.sp, bw=bw.adaptive, kernel = 'gaussian', adaptive=TRUE, longlat = FALSE)
gwr.adaptive
## ***********************************************************************
## * Package GWmodel *
## ***********************************************************************
## Program starts at: 2020-06-01 19:35:31
## Call:
## gwr.basic(formula = GDP_CAPITA ~ Active + planted_ratio + DU_density +
## planted_value_hectare + expenditures_capita + COMP_A + COMP_D +
## COMP_E + COMP_N + COMP_Q + COMP_S + BEDS + Pu_Assets + Pr_Assets +
## Pu_Assets + Pu_Bank + Pr_Bank + Motorcycles_p + Wheeled_tractor_p +
## RURAL_URBAN_1 + RURAL_URBAN_3 + RURAL_URBAN_4 + RURAL_URBAN_2 +
## RURAL_URBAN_5 + GVA_PUBLIC_p, data = mc_res.sp, bw = bw.adaptive,
## kernel = "gaussian", adaptive = TRUE, longlat = FALSE)
##
## Dependent (y) variable: GDP_CAPITA
## Independent variables: Active planted_ratio DU_density planted_value_hectare expenditures_capita COMP_A COMP_D COMP_E COMP_N COMP_Q COMP_S BEDS Pu_Assets Pr_Assets Pu_Bank Pr_Bank Motorcycles_p Wheeled_tractor_p RURAL_URBAN_1 RURAL_URBAN_3 RURAL_URBAN_4 RURAL_URBAN_2 RURAL_URBAN_5 GVA_PUBLIC_p
## Number of data points: 5569
## ***********************************************************************
## * Results of Global Regression *
## ***********************************************************************
##
## Call:
## lm(formula = formula, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -39927 -6996 -2195 3343 272207
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.356e+04 5.595e+03 -16.721 < 2e-16 ***
## Active 1.622e+05 9.069e+03 17.886 < 2e-16 ***
## planted_ratio 1.236e+03 8.927e+01 13.847 < 2e-16 ***
## DU_density -1.666e+03 7.934e+02 -2.100 0.035773 *
## planted_value_hectare 8.305e+01 3.615e+01 2.297 0.021638 *
## expenditures_capita 2.598e+00 1.454e-01 17.869 < 2e-16 ***
## COMP_A 1.458e+05 3.296e+04 4.422 9.94e-06 ***
## COMP_D 1.628e+07 2.170e+06 7.499 7.43e-14 ***
## COMP_E 7.816e+06 2.164e+06 3.612 0.000307 ***
## COMP_N 1.257e+06 2.520e+05 4.986 6.37e-07 ***
## COMP_Q 1.921e+06 5.261e+05 3.651 0.000264 ***
## COMP_S -5.653e+05 1.638e+05 -3.452 0.000561 ***
## BEDS 1.556e+01 6.586e+00 2.363 0.018148 *
## Pu_Assets 1.855e-08 7.710e-09 2.407 0.016136 *
## Pr_Assets 1.049e-07 2.272e-08 4.618 3.95e-06 ***
## Pu_Bank 1.904e+02 3.090e+01 6.160 7.80e-10 ***
## Pr_Bank 1.158e+02 4.248e+01 2.725 0.006454 **
## Motorcycles_p -2.057e+04 3.434e+03 -5.989 2.24e-09 ***
## Wheeled_tractor_p 1.210e+06 5.042e+05 2.400 0.016409 *
## RURAL_URBAN_1 -2.817e+03 7.933e+02 -3.552 0.000386 ***
## RURAL_URBAN_3 -1.939e+03 6.544e+02 -2.963 0.003057 **
## RURAL_URBAN_4 2.171e+03 1.098e+03 1.977 0.048128 *
## RURAL_URBAN_2 5.597e+03 2.165e+03 2.585 0.009751 **
## RURAL_URBAN_5 1.033e+05 9.287e+03 11.123 < 2e-16 ***
## GVA_PUBLIC_p 1.453e+03 1.150e+02 12.634 < 2e-16 ***
##
## ---Significance stars
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 16150 on 5544 degrees of freedom
## Multiple R-squared: 0.3718
## Adjusted R-squared: 0.3691
## F-statistic: 136.7 on 24 and 5544 DF, p-value: < 2.2e-16
## ***Extra Diagnostic information
## Residual sum of squares: 1.446143e+12
## Sigma(hat): 16117.4
## AIC: 123755.2
## AICc: 123755.5
## ***********************************************************************
## * Results of Geographically Weighted Regression *
## ***********************************************************************
##
## *********************Model calibration information*********************
## Kernel function: gaussian
## Adaptive bandwidth: 1206 (number of nearest neighbours)
## Regression points: the same locations as observations are used.
## Distance metric: Euclidean distance metric is used.
##
## ****************Summary of GWR coefficient estimates:******************
## Min. 1st Qu. Median 3rd Qu.
## Intercept -1.7274e+05 -1.3546e+05 -9.9400e+04 -6.1219e+04
## Active 5.1946e+04 1.0827e+05 1.7583e+05 2.2981e+05
## planted_ratio 4.8197e+02 7.8775e+02 8.9275e+02 1.1929e+03
## DU_density -3.6234e+03 -2.3387e+03 -1.5817e+03 -1.3497e+03
## planted_value_hectare -5.4421e+01 8.7191e-01 2.4184e+01 1.6681e+02
## expenditures_capita 1.6338e+00 2.0054e+00 2.4594e+00 3.0437e+00
## COMP_A 9.4176e+04 1.1634e+05 1.4494e+05 2.0179e+05
## COMP_D 1.0954e+07 1.1862e+07 1.7204e+07 2.0414e+07
## COMP_E 5.3961e+06 6.0060e+06 6.6301e+06 8.9392e+06
## COMP_N 4.0213e+05 9.3414e+05 1.5075e+06 2.6039e+06
## COMP_Q 4.6028e+05 1.0050e+06 1.5126e+06 1.8844e+06
## COMP_S -9.5130e+05 -7.4147e+05 -6.3458e+05 -5.3186e+05
## BEDS 2.9484e+00 1.1232e+01 1.5878e+01 1.9906e+01
## Pu_Assets 5.9829e-10 1.1810e-08 1.7233e-08 2.0776e-08
## Pr_Assets 9.2609e-08 9.8002e-08 1.0136e-07 1.1011e-07
## Pu_Bank 1.1826e+02 1.4059e+02 1.7664e+02 2.2127e+02
## Pr_Bank 6.7282e+01 9.0944e+01 1.0634e+02 1.5669e+02
## Motorcycles_p -3.6365e+04 -3.1853e+04 -2.5444e+04 -1.1417e+04
## Wheeled_tractor_p -3.5129e+05 7.0059e+05 1.2385e+06 1.5875e+06
## RURAL_URBAN_1 -4.0419e+03 -2.7381e+03 -2.4534e+03 -2.1270e+03
## RURAL_URBAN_3 -4.3041e+03 -2.1951e+03 -1.6596e+03 -8.0206e+02
## RURAL_URBAN_4 -2.1975e+03 -9.4049e+02 6.8743e+01 2.3049e+03
## RURAL_URBAN_2 1.5528e+03 4.9652e+03 8.1162e+03 1.2372e+04
## RURAL_URBAN_5 3.6761e+04 8.5266e+04 1.1122e+05 1.5370e+05
## GVA_PUBLIC_p 6.4299e+02 1.1354e+03 1.4953e+03 1.7311e+03
## Max.
## Intercept -2.5451e+04
## Active 2.8647e+05
## planted_ratio 1.5685e+03
## DU_density -1.0460e+03
## planted_value_hectare 2.2937e+02
## expenditures_capita 3.5690e+00
## COMP_A 8.0693e+05
## COMP_D 3.0476e+07
## COMP_E 1.0995e+07
## COMP_N 3.2467e+06
## COMP_Q 2.4112e+06
## COMP_S -3.4383e+05
## BEDS 2.6006e+01
## Pu_Assets 0.0000e+00
## Pr_Assets 0.0000e+00
## Pu_Bank 2.7269e+02
## Pr_Bank 2.0541e+02
## Motorcycles_p -6.6058e+03
## Wheeled_tractor_p 2.2485e+06
## RURAL_URBAN_1 -1.7136e+03
## RURAL_URBAN_3 6.0635e+02
## RURAL_URBAN_4 5.2639e+03
## RURAL_URBAN_2 2.1293e+04
## RURAL_URBAN_5 2.0087e+05
## GVA_PUBLIC_p 1.8908e+03
## ************************Diagnostic information*************************
## Number of data points: 5569
## Effective number of parameters (2trace(S) - trace(S'S)): 84.38916
## Effective degrees of freedom (n-2trace(S) + trace(S'S)): 5484.611
## AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 123496.9
## AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 123428.9
## Residual sum of squares: 1.360803e+12
## R-square value: 0.4088895
## Adjusted R-square value: 0.3997927
##
## ***********************************************************************
## Program stops at: 2020-06-01 19:36:27
Adjusted R-square value: 0.3997927 which i better than the Multiple Linear Regression model which has an Adjusted R-Squared of 0.3691 and P-VALUE < 2.2e-16 , we reject the null hypothesis
bw.gwr.aic <- bw.gwr(formula = GDP_CAPITA ~ Active + planted_ratio + DU_density +planted_value_hectare + expenditures_capita + COMP_A + COMP_D + COMP_E + COMP_N + COMP_Q + COMP_S+BEDS+ Pu_Assets + Pr_Assets +Pu_Assets + Pu_Bank + Pr_Bank + Motorcycles_p + Wheeled_tractor_p +RURAL_URBAN_1+RURAL_URBAN_3+RURAL_URBAN_4 +RURAL_URBAN_2+RURAL_URBAN_5+GVA_PUBLIC_p , data=mc_res.sp, approach="AIC", kernel = 'gaussian', adaptive=TRUE, longlat = FALSE, dMat = dMat)
## Take a cup of tea and have a break, it will take a few minutes.
## -----A kind suggestion from GWmodel development group
## Adaptive bandwidth (number of nearest neighbours): 3449 AICc value: 123671.7
## Adaptive bandwidth (number of nearest neighbours): 2140 AICc value: 123602.1
## Adaptive bandwidth (number of nearest neighbours): 1329 AICc value: 123513.7
## Adaptive bandwidth (number of nearest neighbours): 830 AICc value: 123430.6
## Adaptive bandwidth (number of nearest neighbours): 519 AICc value: 123298.8
## Adaptive bandwidth (number of nearest neighbours): 329 AICc value: 123113.5
## Adaptive bandwidth (number of nearest neighbours): 209 AICc value: 122919.4
## Adaptive bandwidth (number of nearest neighbours): 137 AICc value: 122753
## Adaptive bandwidth (number of nearest neighbours): 90 AICc value: 122705.1
## Error in gw_reg(X, Y, W.i, TRUE, i) : inv(): matrix seems singular
## Adaptive bandwidth (number of nearest neighbours): 63 AICc value: Inf
## Adaptive bandwidth (number of nearest neighbours): 108 AICc value: 122707.1
## Adaptive bandwidth (number of nearest neighbours): 80 AICc value: 122702.9
## Adaptive bandwidth (number of nearest neighbours): 72 AICc value: Inf
## Adaptive bandwidth (number of nearest neighbours): 83 AICc value: 122701.6
## Adaptive bandwidth (number of nearest neighbours): 87 AICc value: 122700.6
## Adaptive bandwidth (number of nearest neighbours): 87 AICc value: 122700.6
The results show that 87 is the recommended data points
gwr.adaptive.aic <- gwr.basic(formula = GDP_CAPITA ~ Active + planted_ratio + DU_density +planted_value_hectare + expenditures_capita + COMP_A + COMP_D + COMP_E + COMP_N + COMP_Q + COMP_S+BEDS+ Pu_Assets + Pr_Assets +Pu_Assets + Pu_Bank + Pr_Bank + Motorcycles_p + Wheeled_tractor_p +RURAL_URBAN_1+RURAL_URBAN_3+RURAL_URBAN_4 +RURAL_URBAN_2+RURAL_URBAN_5+GVA_PUBLIC_p , data=mc_res.sp, bw=bw.gwr.aic, kernel = 'gaussian', adaptive=TRUE, longlat = FALSE)
gwr.adaptive.aic
## ***********************************************************************
## * Package GWmodel *
## ***********************************************************************
## Program starts at: 2020-06-01 19:47:53
## Call:
## gwr.basic(formula = GDP_CAPITA ~ Active + planted_ratio + DU_density +
## planted_value_hectare + expenditures_capita + COMP_A + COMP_D +
## COMP_E + COMP_N + COMP_Q + COMP_S + BEDS + Pu_Assets + Pr_Assets +
## Pu_Assets + Pu_Bank + Pr_Bank + Motorcycles_p + Wheeled_tractor_p +
## RURAL_URBAN_1 + RURAL_URBAN_3 + RURAL_URBAN_4 + RURAL_URBAN_2 +
## RURAL_URBAN_5 + GVA_PUBLIC_p, data = mc_res.sp, bw = bw.gwr.aic,
## kernel = "gaussian", adaptive = TRUE, longlat = FALSE)
##
## Dependent (y) variable: GDP_CAPITA
## Independent variables: Active planted_ratio DU_density planted_value_hectare expenditures_capita COMP_A COMP_D COMP_E COMP_N COMP_Q COMP_S BEDS Pu_Assets Pr_Assets Pu_Bank Pr_Bank Motorcycles_p Wheeled_tractor_p RURAL_URBAN_1 RURAL_URBAN_3 RURAL_URBAN_4 RURAL_URBAN_2 RURAL_URBAN_5 GVA_PUBLIC_p
## Number of data points: 5569
## ***********************************************************************
## * Results of Global Regression *
## ***********************************************************************
##
## Call:
## lm(formula = formula, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -39927 -6996 -2195 3343 272207
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.356e+04 5.595e+03 -16.721 < 2e-16 ***
## Active 1.622e+05 9.069e+03 17.886 < 2e-16 ***
## planted_ratio 1.236e+03 8.927e+01 13.847 < 2e-16 ***
## DU_density -1.666e+03 7.934e+02 -2.100 0.035773 *
## planted_value_hectare 8.305e+01 3.615e+01 2.297 0.021638 *
## expenditures_capita 2.598e+00 1.454e-01 17.869 < 2e-16 ***
## COMP_A 1.458e+05 3.296e+04 4.422 9.94e-06 ***
## COMP_D 1.628e+07 2.170e+06 7.499 7.43e-14 ***
## COMP_E 7.816e+06 2.164e+06 3.612 0.000307 ***
## COMP_N 1.257e+06 2.520e+05 4.986 6.37e-07 ***
## COMP_Q 1.921e+06 5.261e+05 3.651 0.000264 ***
## COMP_S -5.653e+05 1.638e+05 -3.452 0.000561 ***
## BEDS 1.556e+01 6.586e+00 2.363 0.018148 *
## Pu_Assets 1.855e-08 7.710e-09 2.407 0.016136 *
## Pr_Assets 1.049e-07 2.272e-08 4.618 3.95e-06 ***
## Pu_Bank 1.904e+02 3.090e+01 6.160 7.80e-10 ***
## Pr_Bank 1.158e+02 4.248e+01 2.725 0.006454 **
## Motorcycles_p -2.057e+04 3.434e+03 -5.989 2.24e-09 ***
## Wheeled_tractor_p 1.210e+06 5.042e+05 2.400 0.016409 *
## RURAL_URBAN_1 -2.817e+03 7.933e+02 -3.552 0.000386 ***
## RURAL_URBAN_3 -1.939e+03 6.544e+02 -2.963 0.003057 **
## RURAL_URBAN_4 2.171e+03 1.098e+03 1.977 0.048128 *
## RURAL_URBAN_2 5.597e+03 2.165e+03 2.585 0.009751 **
## RURAL_URBAN_5 1.033e+05 9.287e+03 11.123 < 2e-16 ***
## GVA_PUBLIC_p 1.453e+03 1.150e+02 12.634 < 2e-16 ***
##
## ---Significance stars
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 16150 on 5544 degrees of freedom
## Multiple R-squared: 0.3718
## Adjusted R-squared: 0.3691
## F-statistic: 136.7 on 24 and 5544 DF, p-value: < 2.2e-16
## ***Extra Diagnostic information
## Residual sum of squares: 1.446143e+12
## Sigma(hat): 16117.4
## AIC: 123755.2
## AICc: 123755.5
## ***********************************************************************
## * Results of Geographically Weighted Regression *
## ***********************************************************************
##
## *********************Model calibration information*********************
## Kernel function: gaussian
## Adaptive bandwidth: 87 (number of nearest neighbours)
## Regression points: the same locations as observations are used.
## Distance metric: Euclidean distance metric is used.
##
## ****************Summary of GWR coefficient estimates:******************
## Min. 1st Qu. Median 3rd Qu.
## Intercept -5.4629e+05 -1.2777e+05 -6.7949e+04 -2.6143e+04
## Active -5.4537e+03 5.2546e+04 1.2506e+05 2.1605e+05
## planted_ratio -1.6683e+03 1.8837e+02 6.4545e+02 1.3601e+03
## DU_density -3.5703e+04 -2.7308e+03 -7.3839e+02 1.1503e+03
## planted_value_hectare -1.7476e+03 -1.0581e+02 5.3066e+01 2.1085e+02
## expenditures_capita -5.8789e-01 7.7567e-01 1.4612e+00 3.0214e+00
## COMP_A -1.0705e+07 1.1739e+05 3.7802e+05 2.0203e+06
## COMP_D -2.5846e+07 3.1670e+06 1.2676e+07 5.0454e+07
## COMP_E -1.1534e+07 9.1163e+04 4.8487e+06 9.2929e+06
## COMP_N -4.6191e+06 4.0592e+05 1.5839e+06 2.9969e+06
## COMP_Q -8.0985e+06 -1.1377e+05 1.4255e+06 2.7713e+06
## COMP_S -2.6794e+06 -1.0350e+06 -3.7615e+05 -2.9591e+04
## BEDS -5.4589e+01 -5.4787e+00 6.4660e+00 1.9853e+01
## Pu_Assets -2.6278e-06 -2.8519e-07 -3.8908e-08 2.0332e-08
## Pr_Assets -1.6078e-05 8.9026e-08 4.5819e-07 4.2317e-06
## Pu_Bank -2.7718e+02 4.9562e+01 1.3283e+02 2.5057e+02
## Pr_Bank -4.1138e+02 7.9902e+00 8.9937e+01 2.6333e+02
## Motorcycles_p -8.8703e+04 -3.0836e+04 -1.4021e+04 -1.6920e+03
## Wheeled_tractor_p -2.2267e+07 -1.0084e+06 1.0705e+06 4.5573e+06
## RURAL_URBAN_1 -9.4069e+03 -2.7748e+03 -1.4794e+03 -3.4852e+02
## RURAL_URBAN_3 -1.3727e+04 -4.0365e+03 -1.0761e+03 7.4963e+02
## RURAL_URBAN_4 -6.4934e+04 -5.2464e+03 -1.3588e+03 2.4970e+03
## RURAL_URBAN_2 -5.2839e+04 -2.6828e+03 1.2654e+03 7.4415e+03
## RURAL_URBAN_5 -7.0391e+04 3.1256e+04 9.0883e+04 1.6457e+05
## GVA_PUBLIC_p -4.5569e+02 4.4710e+02 1.0128e+03 1.6687e+03
## Max.
## Intercept 3.4126e+03
## Active 8.1968e+05
## planted_ratio 5.1318e+03
## DU_density 3.4376e+04
## planted_value_hectare 6.5101e+02
## expenditures_capita 1.3516e+01
## COMP_A 8.9676e+06
## COMP_D 9.3676e+08
## COMP_E 9.8327e+07
## COMP_N 7.1392e+06
## COMP_Q 1.2614e+07
## COMP_S 3.2963e+06
## BEDS 1.5669e+02
## Pu_Assets 0.0000e+00
## Pr_Assets 1.0000e-04
## Pu_Bank 1.1505e+03
## Pr_Bank 3.2803e+03
## Motorcycles_p 2.4970e+04
## Wheeled_tractor_p 3.6013e+07
## RURAL_URBAN_1 5.1539e+03
## RURAL_URBAN_3 9.0297e+03
## RURAL_URBAN_4 2.8712e+04
## RURAL_URBAN_2 7.2807e+04
## RURAL_URBAN_5 5.1475e+05
## GVA_PUBLIC_p 6.4268e+03
## ************************Diagnostic information*************************
## Number of data points: 5569
## Effective number of parameters (2trace(S) - trace(S'S)): 842.3825
## Effective degrees of freedom (n-2trace(S) + trace(S'S)): 4726.617
## AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 122700.6
## AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 121936.6
## Residual sum of squares: 943655332749
## R-square value: 0.5900915
## Adjusted R-square value: 0.5170217
##
## ***********************************************************************
## Program stops at: 2020-06-01 19:49:03
Adjusted R-square value: 0.5170217 which i better than the Multiple Linear Regression model which has an Adjusted R-Squared of 0.3691 . P-VALUE < 2.2e-16 , we reject the null hypothesis
Overall, the AIC approach is the most suitable scheme for the explanatory model. As the adjusted r-squared value for both methods are better than the multiple linear regression model, location affects GDP per capita.
merge data
gwr.adaptive.output <- as.data.frame(gwr.adaptive.aic$SDF)
muni_merged_adaptive <- cbind(muni_merged, as.matrix(gwr.adaptive.output))
muni_merged_adaptive
## Simple feature collection with 5569 features and 172 fields
## Active geometry column: geom
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: -73.99045 ymin: -33.75118 xmax: -28.83594 ymax: 5.271841
## geographic CRS: SIRGAS 2000
## First 10 features:
## code_muni name_muni.x code_state.x abbrev_state.x
## 1 1100015 Alta Floresta D'oeste 11 RO
## 2 1100023 Ariquemes 11 RO
## 3 1100031 Cabixi 11 RO
## 4 1100049 Cacoal 11 RO
## 5 1100056 Cerejeiras 11 RO
## 6 1100064 Colorado Do Oeste 11 RO
## 7 1100072 Corumbiara 11 RO
## 8 1100080 Costa Marques 11 RO
## 9 1100098 Espigão D'oeste 11 RO
## 10 1100106 Guajará-Mirim 11 RO
## area.x CITY STATE CAPITAL IBGE_RES_POP
## 1 7067196247 [m^2] Alta Floresta D'Oeste RO 0 24392
## 2 4427857503 [m^2] Ariquemes RO 0 90353
## 3 1314136221 [m^2] Cabixi RO 0 6313
## 4 3792029006 [m^2] Cacoal RO 0 78574
## 5 2783149399 [m^2] Cerejeiras RO 0 17029
## 6 1450315158 [m^2] Colorado Do Oeste RO 0 18591
## 7 3059753489 [m^2] Corumbiara RO 0 8783
## 8 4987285502 [m^2] Costa Marques RO 0 13678
## 9 4518626296 [m^2] Espigão D'Oeste RO 0 28729
## 10 24854844824 [m^2] Guajará-Mirim RO 0 41656
## IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL
## 1 24392 0 7276 4306 2970
## 2 90240 113 27236 22971 4265
## 3 6310 3 1979 887 1092
## 4 78536 38 24045 19411 4634
## 5 17011 18 5346 4561 785
## 6 18562 29 5962 4441 1521
## 7 8778 5 2655 834 1821
## 8 13462 216 3718 1980 1738
## 9 28720 9 8407 6342 2065
## 10 40709 947 10274 9550 724
## IBGE_POP IBGE_1 IBGE_1.4 IBGE_5.9 IBGE_10.14 IBGE_15.59 IBGE_60.
## 1 13804 203 869 1159 1281 9023 1269
## 2 69068 1122 4383 6101 6870 46000 4592
## 3 2689 43 191 206 272 1742 235
## 4 61699 923 3599 4790 5722 42027 4638
## 5 13755 200 827 1129 1333 8976 1290
## 6 13404 189 822 1002 1196 8801 1394
## 7 2559 47 159 238 240 1685 190
## 8 7366 137 644 843 902 4358 482
## 9 19649 324 1335 1652 1906 12988 1444
## 10 34686 643 2521 3546 3809 21608 2559
## IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION_. IDHM.Ranking.2010 IDHM IDHM_Renda
## 1 18288 101470 3283 0.640 0.657
## 2 7115 33365 1847 0.700 0.716
## 3 41086 107975 3117 0.650 0.650
## 4 18180 174665 1377 0.718 0.727
## 5 52326 150755 2141 0.690 0.688
## 6 6268 21765 2326 0.685 0.676
## 7 71742 182267 3865 0.613 0.630
## 8 1395 5232 3901 0.611 0.616
## 9 5588 35076 2623 0.672 0.691
## 10 787 6764 2982 0.657 0.663
## IDHM_Longevidade IDHM_Educacao ALT PAY_TV FIXED_PHONES AREA
## 1 0.763 0.526 337.74 240 687 7067.03
## 2 0.806 0.600 138.69 2267 9191 4426.57
## 3 0.757 0.559 236.06 50 188 1314.35
## 4 0.821 0.620 177.45 1806 6491 3792.89
## 5 0.799 0.602 262.81 307 1215 2783.30
## 6 0.814 0.584 419.09 235 1107 1451.06
## 7 0.774 0.473 269.17 66 301 3060.32
## 8 0.751 0.493 145.21 192 346 4987.18
## 9 0.819 0.536 262.53 283 1172 4518.03
## 10 0.823 0.519 133.41 1116 3291 24855.72
## REGIAO_TUR CATEGORIA_TUR ESTIMATED_POP
## 1 Vale Do Guaporé D 23167
## 2 Vale Do Jamari C 106168
## 3 Vale Do Guaporé D 5438
## 4 Br 364 - Caminhos De Rondon C 84813
## 5 <NA> <NA> 16444
## 6 <NA> <NA> 16227
## 7 <NA> <NA> 7567
## 8 Vale Do Guaporé D 17855
## 9 Br 364 - Caminhos De Rondon D 32047
## 10 Polo Guajará-Mirim C 45783
## RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES GVA_PUBLIC
## 1 Intermediário Adjacente 166143.38 31.27 114455.32 142727.55
## 2 Urbano 145068.79 353163.07 879.97 589088.78
## 3 Rural Adjacente 59081.09 4623.27 24091.22 40041.59
## 4 Urbano 188004.72 230109.20 846490.22 485242.29
## 5 Urbano 52891.37 22445.48 175059.55 99495.64
## 6 Intermediário Adjacente 72684.77 28.37 85558.72 100799.66
## 7 Rural Remoto 132176.94 10032.64 30210.50 55416.89
## 8 Rural Remoto 62561.65 6619.74 38862.21 91.02
## 9 Intermediário Adjacente 115438.59 56841.46 159498.64 170735.98
## 10 Urbano 49929.14 29240.85 313.12 250.77
## X.GVA_TOTAL. TAXES GDP POP_GDP GDP_CAPITA
## 1 454596.58 23186.17 477782.74 25506 18732.17
## 2 1967287.27 216095.93 2183383.20 105896 20618.18
## 3 127837.15 5.51 133345.39 6289 21202.96
## 4 1749846.43 194940.22 1944786.64 87877 22130.78
## 5 349.89 58.16 408047.84 17959 22721.08
## 6 287409.88 18466.33 305876.20 18639 16410.55
## 7 227836.98 8.74 236.58 8749 27040.60
## 8 199063.72 7.06 206123.18 17031 12102.82
## 9 502514.66 42149.00 544663.66 32712 16650.27
## 10 643057.92 97.10 740.16 47048 15732.01
## GVA_MAIN
## 1 Administração, defesa, educação e saúde públicas e seguridade social
## 2 Demais serviços
## 3 Administração, defesa, educação e saúde públicas e seguridade social
## 4 Demais serviços
## 5 Administração, defesa, educação e saúde públicas e seguridade social
## 6 Administração, defesa, educação e saúde públicas e seguridade social
## 7 Pecuária, inclusive apoio à pecuária
## 8 Administração, defesa, educação e saúde públicas e seguridade social
## 9 Administração, defesa, educação e saúde públicas e seguridade social
## 10 Administração, defesa, educação e saúde públicas e seguridade social
## MUN_EXPENDIT COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G
## 1 50218466 508 3 1 38 7 4 26 270
## 2 201979608 2221 24 28 223 1 12 79 1054
## 3 17904387 60 0 0 7 0 0 1 32
## 4 168366053 1846 13 9 180 1 7 80 839
## 5 42641592 379 6 0 40 0 1 7 185
## 6 31611296 264 3 2 29 3 0 11 147
## 7 25501329 94 2 0 9 0 0 2 54
## 8 25561919 322 1 0 37 0 0 10 158
## 9 59162828 612 1 2 96 2 0 31 306
## 10 NA 502 2 1 27 0 2 10 322
## COMP_H COMP_I COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R
## 1 23 20 8 1 3 12 10 4 21 13 6
## 2 68 136 27 32 12 91 116 5 68 74 28
## 3 2 4 2 0 0 1 2 2 2 1 0
## 4 76 102 34 19 31 83 93 4 52 104 18
## 5 19 20 3 1 3 15 8 3 10 12 2
## 6 11 13 3 3 0 8 9 4 5 4 1
## 7 1 3 0 0 0 3 4 4 2 4 0
## 8 10 14 2 1 0 4 6 4 10 6 3
## 9 31 15 5 3 5 17 15 4 16 20 5
## 10 18 32 5 2 0 10 12 5 12 9 3
## COMP_S COMP_T COMP_U HOTELS BEDS Pr_Agencies Pu_Agencies Pr_Bank Pu_Bank
## 1 38 0 0 NA NA 1 2 1 2
## 2 143 0 0 1 78 2 4 2 3
## 3 4 0 0 1 27 NA NA NA NA
## 4 101 0 0 1 40 2 3 2 3
## 5 44 0 0 1 40 1 3 1 3
## 6 8 0 0 NA NA 1 2 1 2
## 7 6 0 0 NA NA NA NA NA NA
## 8 56 0 0 2 142 0 1 0 1
## 9 38 0 0 NA NA 1 2 1 2
## 10 30 0 0 2 67 1 3 1 3
## Pr_Assets Pu_Assets Cars Motorcycles Wheeled_tractor UBER MAC WAL.MART
## 1 38958866 245288714 2464 9268 1 NA NA NA
## 2 156637075 2487942240 17513 42664 1 NA NA NA
## 3 NA NA 925 1838 0 NA NA NA
## 4 145453086 2339395610 17561 38073 3 NA NA NA
## 5 57939053 392087789 2948 6439 0 NA NA NA
## 6 77276523 237735184 3158 6874 0 NA NA NA
## 7 NA NA 831 2380 0 NA NA NA
## 8 0 94261213 823 2920 0 NA NA NA
## 9 48113784 193339551 3473 11579 4 NA NA NA
## 10 38194278 356229103 3906 9812 1 NA NA NA
## POST_OFFICES name_muni.y code_state.y abbrev_state.y
## 1 1 Alta Floresta D'oeste 11 RO
## 2 2 Ariquemes 11 RO
## 3 1 Cabixi 11 RO
## 4 2 Cacoal 11 RO
## 5 1 Cerejeiras 11 RO
## 6 1 Colorado Do Oeste 11 RO
## 7 1 Corumbiara 11 RO
## 8 2 Costa Marques 11 RO
## 9 2 Espigão D'oeste 11 RO
## 10 1 Guajará-Mirim 11 RO
## area.y MLR_RES Intercept Active planted_ratio
## 1 7067196247 [m^2] -12020.522 -81790.8172 136735.490 3221.53389
## 2 4427857503 [m^2] -5894.806 -228132.7968 377201.121 1309.86510
## 3 1314136221 [m^2] -7676.922 -94825.5544 159255.044 3386.81708
## 4 3792029006 [m^2] -1308.227 -168590.2009 285763.371 1374.62953
## 5 2783149399 [m^2] -8799.358 434.3238 2365.147 291.62911
## 6 1450315158 [m^2] 4818.894 -3609.0853 21849.293 -64.12743
## 7 3059753489 [m^2] -4769.690 -37676.2210 74759.926 738.61435
## 8 4987285502 [m^2] 1074.968 -15045.5275 39295.656 493.51277
## 9 4518626296 [m^2] -1104.994 -82727.9944 152067.067 358.71984
## 10 24854844824 [m^2] -24918.626 -62564.9170 115109.151 1427.33585
## DU_density planted_value_hectare expenditures_capita COMP_A.1 COMP_D.1
## 1 9124.8622 2.348508 1.2263023 183901.01 14062447
## 2 7749.5459 -154.470303 1.7475411 162745.11 77450580
## 3 6894.8037 19.149274 0.9367677 214693.29 6426898
## 4 6948.0735 -164.083760 1.3807173 38978.57 -2457925
## 5 1256.5443 283.923775 0.9415918 7787190.64 208600834
## 6 -998.0603 37.884183 0.2135672 1985352.21 9717070
## 7 -734.3313 352.675520 2.5368855 2572954.65 -1575933
## 8 -634.9888 58.461733 0.8470693 1148944.00 10107818
## 9 3196.8903 -261.543382 1.2287037 104670.02 44522571
## 10 -2417.6358 -79.316931 3.9260895 2460044.74 16098601
## COMP_E.1 COMP_N.1 COMP_Q.1 COMP_S.1 BEDS.1 Pu_Assets.1
## 1 8161865 -419438.4 3577178.40 -837335.61 67.6056848 2.573309e-08
## 2 -1778884 1030386.4 714175.61 -1870685.13 37.4029197 1.544545e-08
## 3 12064146 -547595.3 2352147.50 -937381.75 64.4677275 1.169330e-08
## 4 -4938728 3372418.9 -93895.07 -725282.66 1.3935288 -2.216843e-08
## 5 8043484 2168688.8 12110889.49 -1569835.95 43.4670988 1.915530e-07
## 6 -1420066 303539.4 388625.86 42491.29 15.0422851 1.411929e-07
## 7 14401477 2531312.1 -3646786.18 -31366.07 -5.3226287 -4.083299e-08
## 8 1398041 2414449.9 -1049949.14 -261448.71 9.4565118 5.842581e-08
## 9 3674043 3186917.0 -1251370.06 -372445.30 -0.2259307 -9.971455e-08
## 10 7892352 527726.4 1185028.97 -507749.26 5.0460608 -1.562359e-07
## Pr_Assets.1 Pu_Bank.1 Pr_Bank.1 Motorcycles_p Wheeled_tractor_p
## 1 -8.507633e-07 3.526389 120.961962 -33188.866 26176835.9
## 2 9.225396e-08 165.298763 74.695710 -47427.037 -378925.4
## 3 5.177459e-07 -16.830953 97.314352 -31304.321 29005199.0
## 4 1.011897e-07 370.547858 -36.640401 -55878.324 9312133.0
## 5 -5.498973e-06 114.264661 56.151427 6781.724 14080003.0
## 6 3.886574e-06 40.783697 3.356608 2726.624 4910068.0
## 7 1.642224e-06 244.707723 58.867459 -12350.688 1392647.9
## 8 2.161800e-06 122.763183 148.626622 -3130.386 479136.6
## 9 1.088409e-07 106.714240 277.678978 -10658.267 6511210.1
## 10 3.004948e-06 77.254939 396.002707 -28681.370 139593.5
## RURAL_URBAN_1 RURAL_URBAN_3 RURAL_URBAN_4 RURAL_URBAN_2 RURAL_URBAN_5
## 1 83.40903 2466.6462 6818.12239 24992.3133 146031.755
## 2 -1525.78445 5432.4798 4661.02294 24631.9052 286731.638
## 3 1146.27797 3818.2589 2925.47049 8448.5118 162763.847
## 4 -481.88436 3488.6070 -1002.43025 13820.4407 245977.182
## 5 -1484.27973 -261.5996 42.41828 1371.4774 2106.466
## 6 -2047.45224 -2949.2889 -3987.45126 -1323.1565 11355.143
## 7 -6149.86364 -6893.5634 -8075.41308 -4244.6785 95080.800
## 8 -2175.75462 -3452.9636 -4328.71432 -648.5095 22727.295
## 9 -2999.68152 -1790.8810 4948.95546 71997.0265 133385.613
## 10 -5109.45700 -4261.5036 -2738.34268 -6538.4753 56567.743
## GVA_PUBLIC_p y yhat residual CV_Score Stud_residual
## 1 1518.7074 20664.57 24081.976 -3417.4059 0 -0.25443223
## 2 1245.8657 25591.70 40583.081 -14991.3809 0 -1.21004035
## 3 1291.4609 15628.40 26315.966 -10687.5657 0 -0.78420254
## 4 610.9763 18250.42 19957.133 -1706.7125 0 -0.12463040
## 5 821.6804 8222.36 17261.498 -9039.1377 0 -0.70699870
## 6 143.2738 6370.41 6654.224 -283.8140 0 -0.02109761
## 7 920.8330 6982.70 7806.517 -823.8168 0 -0.06620694
## 8 323.6922 6256.80 5196.603 1060.1966 0 0.09049018
## 9 998.9155 21173.60 26177.432 -5003.8325 0 -0.39762594
## 10 1819.3957 24739.02 53037.021 -28298.0012 0 -2.13475403
## Intercept_SE Active_SE planted_ratio_SE DU_density_SE
## 1 46531.49 70900.81 474.8718 9198.145
## 2 21073.15 32165.23 227.6791 5062.159
## 3 39324.58 60670.06 488.5516 8104.618
## 4 23790.52 36312.98 329.2352 5606.601
## 5 22733.91 38234.90 1786.3611 4425.647
## 6 27223.15 43972.78 1157.6803 3365.626
## 7 24326.12 39831.41 808.0594 3276.072
## 8 16772.02 27095.87 452.7589 1951.257
## 9 37508.02 57999.54 251.7487 4762.059
## 10 17527.16 26357.93 283.9353 1766.524
## planted_value_hectare_SE expenditures_capita_SE COMP_A_SE COMP_D_SE
## 1 466.3661 0.5777266 114666.34 20580667
## 2 96.7850 0.3426780 37989.49 15137219
## 3 447.0636 0.5451733 139563.28 24715649
## 4 109.1037 0.4420232 58450.26 16526757
## 5 171.5802 0.7116749 5183358.62 53261396
## 6 187.2376 0.7418658 3845564.93 9260617
## 7 193.9674 0.6404575 3062217.31 21406442
## 8 134.4118 0.4500085 2930633.23 4634350
## 9 246.3038 0.5388786 65237.07 23676871
## 10 138.8924 0.4777640 889737.62 5131412
## COMP_E_SE COMP_N_SE COMP_Q_SE COMP_S_SE BEDS_SE Pu_Assets_SE Pr_Assets_SE
## 1 12568998 1243295.9 2007141 1359702.4 18.65744 4.728138e-08 4.582336e-06
## 2 6131440 760532.8 1235692 520464.9 13.69406 7.135043e-09 2.065692e-08
## 3 11696326 1168501.0 1760101 1017804.0 18.38738 4.738672e-08 4.582679e-06
## 4 8347299 923454.9 1548482 588201.0 19.30464 1.926100e-08 2.454826e-08
## 5 21734517 4562985.5 6920993 1591692.8 45.12674 8.790578e-07 1.587749e-05
## 6 16063748 4210504.7 4439774 751348.0 41.59532 9.032875e-07 2.865467e-05
## 7 18757150 2856030.8 3119070 745900.5 28.01771 7.730981e-08 4.510292e-06
## 8 11058068 2328663.6 2970073 575643.8 21.35235 4.178892e-07 4.767374e-06
## 9 7136855 1072360.9 1971822 749787.3 25.82491 2.138241e-07 2.734076e-08
## 10 3411436 540969.1 1106473 291876.4 18.66749 1.905825e-07 2.276738e-06
## Pu_Bank_SE Pr_Bank_SE Motorcycles_p_SE Wheeled_tractor_p_SE RURAL_URBAN_1_SE
## 1 195.07211 159.58865 17893.189 12301219 3467.111
## 2 99.50461 72.48511 10903.035 3177177 1697.271
## 3 185.88750 150.63077 17027.929 11472920 3400.672
## 4 146.24729 118.02740 13791.116 4413250 2380.205
## 5 283.04420 458.41269 15279.479 20148887 3231.886
## 6 322.64122 584.77076 13674.802 18586131 4084.991
## 7 243.33192 238.20835 17654.469 18752139 3102.002
## 8 170.05639 381.63985 9418.921 9555210 2204.494
## 9 128.68059 88.72080 21935.170 3408513 2440.624
## 10 53.01121 251.55979 13063.787 635494 3469.998
## RURAL_URBAN_3_SE RURAL_URBAN_4_SE RURAL_URBAN_2_SE RURAL_URBAN_5_SE
## 1 3131.592 6779.617 8298.985 49957.80
## 2 1637.794 4762.435 6790.064 25985.61
## 3 2879.105 5100.590 10009.645 43331.53
## 4 1996.715 5040.333 10742.306 28407.03
## 5 3008.234 3693.793 6135.595 27101.29
## 6 4325.620 8219.486 15403.639 30846.39
## 7 2945.812 3968.754 8391.658 31078.64
## 8 2077.206 5166.359 14302.321 21915.44
## 9 2537.558 9054.746 14939.489 39914.79
## 10 2155.517 12241.911 14726.952 20181.12
## GVA_PUBLIC_p_SE Intercept_TV Active_TV planted_ratio_TV DU_density_TV
## 1 618.8910 -1.75775197 1.92854618 6.78400745 0.9920329
## 2 329.8425 -10.82575524 11.72698411 5.75312017 1.5308776
## 3 566.9846 -2.41135603 2.62493627 6.93236356 0.8507253
## 4 389.2164 -7.08644348 7.86945571 4.17522054 1.2392667
## 5 450.6956 0.01910467 0.06185833 0.16325316 0.2839233
## 6 502.3422 -0.13257411 0.49688227 -0.05539304 -0.2965452
## 7 563.5005 -1.54879687 1.87690900 0.91405950 -0.2241499
## 8 333.3970 -0.89706113 1.45024508 1.09001224 -0.3254254
## 9 496.2932 -2.20560803 2.62186665 1.42491257 0.6713253
## 10 380.9301 -3.56959818 4.36715436 5.02697602 -1.3685836
## planted_value_hectare_TV expenditures_capita_TV COMP_A_TV COMP_D_TV
## 1 0.005035761 2.1226343 1.6037924 0.68328433
## 2 -1.596014975 5.0996601 4.2839508 5.11656590
## 3 0.042833448 1.7182936 1.5383222 0.26003354
## 4 -1.503924218 3.1236310 0.6668673 -0.14872395
## 5 1.654757976 1.3230645 1.5023446 3.91654836
## 6 0.202332172 0.2878784 0.5162706 1.04928967
## 7 1.818220629 3.9610523 0.8402260 -0.07361955
## 8 0.434944881 1.8823405 0.3920463 2.18106476
## 9 -1.061873305 2.2801123 1.6044561 1.88042462
## 10 -0.571067529 8.2176336 2.7649103 3.13726552
## COMP_E_TV COMP_N_TV COMP_Q_TV COMP_S_TV BEDS_TV Pu_Assets_TV
## 1 0.64936478 -0.33736004 1.78222615 -0.61582269 3.623524252 0.5442541
## 2 -0.29012493 1.35482178 0.57795591 -3.59425811 2.731323811 2.1647305
## 3 1.03144752 -0.46863062 1.33637085 -0.92098454 3.506086036 0.2467632
## 4 -0.59165579 3.65195831 -0.06063684 -1.23305233 0.072186202 -1.1509494
## 5 0.37007881 0.47527847 1.74987744 -0.98626814 0.963222575 0.2179072
## 6 -0.08840189 0.07209099 0.08753280 0.05655340 0.361634097 0.1563101
## 7 0.76778597 0.88630420 -1.16919027 -0.04205128 -0.189973701 -0.5281735
## 8 0.12642720 1.03683927 -0.35350956 -0.45418484 0.442879272 0.1398117
## 9 0.51479864 2.97186999 -0.63462626 -0.49673464 -0.008748557 -0.4663390
## 10 2.31349859 0.97552041 1.07099641 -1.73960340 0.270312787 -0.8197810
## Pr_Assets_TV Pu_Bank_TV Pr_Bank_TV Motorcycles_p_TV Wheeled_tractor_p_TV
## 1 -0.1856615 0.01807736 0.75796092 -1.8548324 2.12798708
## 2 4.4660080 1.66121712 1.03049729 -4.3498930 -0.11926482
## 3 0.1129789 -0.09054376 0.64604565 -1.8384104 2.52814435
## 4 4.1220743 2.53370743 -0.31043980 -4.0517623 2.11003976
## 5 -0.3463377 0.40369900 0.12249099 0.4438452 0.69879804
## 6 0.1356349 0.12640572 0.00574004 0.1993904 0.26417914
## 7 0.3641059 1.00565404 0.24712592 -0.6995786 0.07426608
## 8 0.4534572 0.72189693 0.38944209 -0.3323508 0.05014401
## 9 3.9809008 0.82929553 3.12980695 -0.4858985 1.91027869
## 10 1.3198479 1.45733213 1.57418920 -2.1954867 0.21966140
## RURAL_URBAN_1_TV RURAL_URBAN_3_TV RURAL_URBAN_4_TV RURAL_URBAN_2_TV
## 1 0.02405721 0.7876652 1.00567956 3.01149038
## 2 -0.89896360 3.3169488 0.97870586 3.62763938
## 3 0.33707399 1.3261965 0.57355526 0.84403708
## 4 -0.20245496 1.7471733 -0.19888177 1.28654323
## 5 -0.45926118 -0.0869612 0.01148366 0.22352801
## 6 -0.50121346 -0.6818188 -0.48512174 -0.08589896
## 7 -1.98254635 -2.3401237 -2.03474795 -0.50582117
## 8 -0.98696324 -1.6623112 -0.83786561 -0.04534295
## 9 -1.22906316 -0.7057497 0.54655927 4.81924292
## 10 -1.47246693 -1.9770213 -0.22368589 -0.44398020
## RURAL_URBAN_5_TV GVA_PUBLIC_p_TV Local_R2 coords.x1 coords.x2
## 1 2.92310195 2.4539173 0.4515786 -49.44055 -16.758812
## 2 11.03424812 3.7771538 0.4558879 -47.39683 -18.487565
## 3 3.75624521 2.2777709 0.4454983 -48.71881 -16.182672
## 4 8.65902478 1.5697601 0.3843157 -45.44619 -19.155848
## 5 0.07772568 1.8231383 0.8293956 -48.88440 -1.723470
## 6 0.36811908 0.2852115 0.9777543 -39.04755 -7.356977
## 7 3.05936168 1.6341300 0.7694517 -41.66161 -13.253532
## 8 1.03704485 0.9708910 0.8836936 -39.11659 -8.723418
## 9 3.34175938 2.0127528 0.4646219 -50.31253 -23.300494
## 10 2.80300260 4.7761926 0.5307132 -51.02527 -27.608987
## geom geometry
## 1 POLYGON ((-62.19465 -11.827... POINT (-61.99982 -11.93554)
## 2 POLYGON ((-62.53648 -9.7322... POINT (-63.03327 -9.908463)
## 3 POLYGON ((-60.37075 -13.363... POINT (-60.54431 -13.49976)
## 4 POLYGON ((-61.0008 -11.2973... POINT (-61.44294 -11.43387)
## 5 POLYGON ((-61.49976 -13.005... POINT (-60.81843 -13.19503)
## 6 POLYGON ((-60.50475 -12.966... POINT (-60.55507 -13.13056)
## 7 POLYGON ((-61.34273 -12.666... POINT (-60.9487 -12.99752)
## 8 POLYGON ((-63.71199 -11.650... POINT (-64.23165 -12.43601)
## 9 POLYGON ((-60.94827 -10.988... POINT (-61.02017 -11.52855)
## 10 POLYGON ((-65.37724 -10.431... POINT (-65.32395 -10.77388)
Check if any empty geometry after joining
check(muni_merged_adaptive)
## [1] "For: muni_merged_adaptive"
## [1] "Number of Invalid polygons: 0"
summary(gwr.adaptive.aic$SDF$Local_R2)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3191 0.4925 0.6173 0.6408 0.7669 0.9857
tm_shape(muni_merged_adaptive)+
tm_layout(legend.outside = TRUE)+
tm_polygons(col="Local_R2", border.alpha = 0.1)
The local R2 values determines have better prediction using the model. By plotting the Local R2 values by municipality, we can identify which municipalities’ GDP_CAPITAL is predicted well and not predicted well with GWRmodel.The values range from 0.3 to 1.0 .Mapping the local R2 values on the brazil municipality spatialpolygons to determine where does GWR predicts better and where it doesn’t could provide insights on which variables affects the regression model. If the values are low it would mean that the local model is performing poorly. It is hard to interpet from the plot as there seems to be little pattern in the distribution except the low r2 values in the central of brazil.