1 Assignment Overview:

In this take-home exercise, you are tasked to determine factors affecting the unequal development of Brazil at the municipality level by using the data provided. The specific task of the analysis are as follows:

  • Prepare a choropleth map showing the distribution of GDP per capita, 2016 at municipality level.

  • Calibrate an explanatory model to explain factors affecting the GDP per capita at the municipality level by using multiple linear regression method.

  • Prepare a choropleth map showing the distribution of the residual of the GDP per capita.

  • Calibrate an explanatory model to explain factors affecting the GDP per capita at the municipality level by using geographically weighted regression method.

  • Prepare a series of choropleth maps showing the outputs of the geographically weighted regression model

2 Setup

2.1 Loading in the necessary packages

The R packages needed for this exercise are as follows:

Geospatial statistical modelling package * GWmodel, heatmaply, spatstat Spatial data handling * sf, geobr Attribute data handling * tidyverse, readr, ggplot2 and dplyr Choropleth mapping * tmap Savling and loading Geospatial data * rgdal (for easier loading of data)

The code chunks below installs and launches these R packages into R environment.

packages = c('olsrr', 'corrplot', 'ggpubr', 'sf', 'spdep', 'GWmodel', 'tmap', 'tidyverse', 'geobr','rgdal', 'heatmaply', "spatstat")
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

2.2 Creating testing functions for validity and NA values

# Retrieves a quick breakdown of the number of NA rows and invalid polygons/points
Validity_NA_Check <- function(target_st) {
  validity <- st_is_valid(target_st)
  NA_rows <- target_st[rowSums(is.na(target_st))!=0,]
  Invalid_rows <- which(validity==FALSE)
  print(paste("For:", deparse(substitute(target_st))))
  print(paste("Number of Invalid polygons/points is:", length(Invalid_rows)))
  print(paste("Number of NA rows is:", nrow((NA_rows))))
}

# Retrieves the exact polygon which is invalid
get_invalid <- function(target_st) {
  validity <- st_is_valid(target_st)
  Invalid_rows <- which(validity==FALSE)
  return(Invalid_rows)
}

# Retrieves the exact rows which contain NA values for you to check the columns
get_NA_rows <- function(target_st) {
  NA_rows <- target_st[rowSums(is.na(target_st))!=0,]
  return(NA_rows)
}

# A cleaning function that replaces NA with "Missing" so that calculations can still be done.
## This function is a little unnessary as we will not be using the data attached to the geospatial points. 
replace_NA_with_zero <- function(x, column_name){
  x$column_name[is.na(x$column_name)] <- 0
}

3 Data Wrangling and Formatting

3.1 Aspatial Data Wrangling

3.1.1 Importing the aspatial data

The condo_resale_2015 is in csv file format. The codes chunk below uses read_csv() function of readr package to import condo_resale_2015 into R as a tibble data frame called condo_resale.

Brazil_cities_raw = read_delim("data/aspatial/BRAZIL_CITIES.csv", ";")

3.1.2 Importing Data Dictionary as a dataframe for reference

Reference = read_delim("data/aspatial/Data_Dictionary.csv", ";")

3.1.3 Checking input data

summary(Brazil_cities_raw)
##      CITY              STATE              CAPITAL          IBGE_RES_POP     
##  Length:5573        Length:5573        Min.   :0.000000   Min.   :     805  
##  Class :character   Class :character   1st Qu.:0.000000   1st Qu.:    5235  
##  Mode  :character   Mode  :character   Median :0.000000   Median :   10934  
##                                        Mean   :0.004845   Mean   :   34278  
##                                        3rd Qu.:0.000000   3rd Qu.:   23424  
##                                        Max.   :1.000000   Max.   :11253503  
##                                                           NA's   :8         
##  IBGE_RES_POP_BRAS  IBGE_RES_POP_ESTR     IBGE_DU        IBGE_DU_URBAN    
##  Min.   :     805   Min.   :     0.0   Min.   :    239   Min.   :     60  
##  1st Qu.:    5230   1st Qu.:     0.0   1st Qu.:   1572   1st Qu.:    874  
##  Median :   10926   Median :     0.0   Median :   3174   Median :   1846  
##  Mean   :   34200   Mean   :    77.5   Mean   :  10303   Mean   :   8859  
##  3rd Qu.:   23390   3rd Qu.:    10.0   3rd Qu.:   6726   3rd Qu.:   4624  
##  Max.   :11133776   Max.   :119727.0   Max.   :3576148   Max.   :3548433  
##  NA's   :8          NA's   :8          NA's   :10        NA's   :10       
##  IBGE_DU_RURAL      IBGE_POP            IBGE_1            IBGE_1-4     
##  Min.   :    3   Min.   :     174   Min.   :     0.0   Min.   :     5  
##  1st Qu.:  487   1st Qu.:    2801   1st Qu.:    38.0   1st Qu.:   158  
##  Median :  931   Median :    6170   Median :    92.0   Median :   376  
##  Mean   : 1463   Mean   :   27595   Mean   :   383.3   Mean   :  1544  
##  3rd Qu.: 1832   3rd Qu.:   15302   3rd Qu.:   232.0   3rd Qu.:   951  
##  Max.   :33809   Max.   :10463636   Max.   :129464.0   Max.   :514794  
##  NA's   :81      NA's   :8          NA's   :8          NA's   :8       
##     IBGE_5-9        IBGE_10-14       IBGE_15-59         IBGE_60+      
##  Min.   :     7   Min.   :    12   Min.   :     94   Min.   :     29  
##  1st Qu.:   220   1st Qu.:   259   1st Qu.:   1734   1st Qu.:    341  
##  Median :   516   Median :   588   Median :   3841   Median :    722  
##  Mean   :  2069   Mean   :  2381   Mean   :  18212   Mean   :   3004  
##  3rd Qu.:  1300   3rd Qu.:  1478   3rd Qu.:   9628   3rd Qu.:   1724  
##  Max.   :684443   Max.   :783702   Max.   :7058221   Max.   :1293012  
##  NA's   :8        NA's   :8        NA's   :8         NA's   :8        
##  IBGE_PLANTED_AREA   IBGE_CROP_PRODUCTION_$ IDHM Ranking 2010      IDHM       
##  Min.   :      0.0   Min.   :      0        Min.   :   1      Min.   :0.4180  
##  1st Qu.:    910.2   1st Qu.:   2326        1st Qu.:1392      1st Qu.:0.5990  
##  Median :   3471.5   Median :  13846        Median :2783      Median :0.6650  
##  Mean   :  14179.9   Mean   :  57384        Mean   :2783      Mean   :0.6592  
##  3rd Qu.:  11194.2   3rd Qu.:  55619        3rd Qu.:4174      3rd Qu.:0.7180  
##  Max.   :1205669.0   Max.   :3274885        Max.   :5565      Max.   :0.8620  
##  NA's   :3           NA's   :3              NA's   :8         NA's   :8       
##    IDHM_Renda     IDHM_Longevidade IDHM_Educacao         LONG       
##  Min.   :0.4000   Min.   :0.6720   Min.   :0.2070   Min.   :-72.92  
##  1st Qu.:0.5720   1st Qu.:0.7690   1st Qu.:0.4900   1st Qu.:-50.87  
##  Median :0.6540   Median :0.8080   Median :0.5600   Median :-46.52  
##  Mean   :0.6429   Mean   :0.8016   Mean   :0.5591   Mean   :-46.23  
##  3rd Qu.:0.7070   3rd Qu.:0.8360   3rd Qu.:0.6310   3rd Qu.:-41.40  
##  Max.   :0.8910   Max.   :0.8940   Max.   :0.8250   Max.   :-32.44  
##  NA's   :8        NA's   :8        NA's   :8        NA's   :9       
##       LAT               ALT               PAY_TV         FIXED_PHONES    
##  Min.   :-33.688   Min.   :     0.0   Min.   :      1   Min.   :      3  
##  1st Qu.:-22.838   1st Qu.:   169.8   1st Qu.:     88   1st Qu.:    119  
##  Median :-18.089   Median :   406.5   Median :    247   Median :    327  
##  Mean   :-16.444   Mean   :   893.8   Mean   :   3094   Mean   :   6567  
##  3rd Qu.: -8.489   3rd Qu.:   628.9   3rd Qu.:    815   3rd Qu.:   1151  
##  Max.   :  4.585   Max.   :874579.0   Max.   :2047668   Max.   :5543127  
##  NA's   :9         NA's   :9          NA's   :3         NA's   :3        
##       AREA            REGIAO_TUR        CATEGORIA_TUR      ESTIMATED_POP     
##  Min.   :     3.57   Length:5573        Length:5573        Min.   :     786  
##  1st Qu.:   204.44   Class :character   Class :character   1st Qu.:    5454  
##  Median :   416.59   Mode  :character   Mode  :character   Median :   11590  
##  Mean   :  1517.44                                         Mean   :   37432  
##  3rd Qu.:  1026.57                                         3rd Qu.:   25296  
##  Max.   :159533.33                                         Max.   :12176866  
##  NA's   :3                                                 NA's   :3         
##  RURAL_URBAN         GVA_AGROPEC       GVA_INDUSTRY       GVA_SERVICES      
##  Length:5573        Min.   :      0   Min.   :       1   Min.   :        2  
##  Class :character   1st Qu.:   4189   1st Qu.:    1726   1st Qu.:    10112  
##  Mode  :character   Median :  20426   Median :    7424   Median :    31211  
##                     Mean   :  47271   Mean   :  175928   Mean   :   489451  
##                     3rd Qu.:  51227   3rd Qu.:   41022   3rd Qu.:   115406  
##                     Max.   :1402282   Max.   :63306755   Max.   :464656988  
##                     NA's   :3         NA's   :3          NA's   :3          
##    GVA_PUBLIC         GVA_TOTAL             TAXES                GDP           
##  Min.   :       7   Min.   :       17   Min.   :   -14159   Min.   :       15  
##  1st Qu.:   17267   1st Qu.:    42253   1st Qu.:     1305   1st Qu.:    43709  
##  Median :   35866   Median :   119492   Median :     5100   Median :   125153  
##  Mean   :  123768   Mean   :   832987   Mean   :   118864   Mean   :   954584  
##  3rd Qu.:   89245   3rd Qu.:   313963   3rd Qu.:    22197   3rd Qu.:   329539  
##  Max.   :41902893   Max.   :569910503   Max.   :117125387   Max.   :687035890  
##  NA's   :3          NA's   :3           NA's   :3           NA's   :3          
##     POP_GDP           GDP_CAPITA       GVA_MAIN          MUN_EXPENDIT      
##  Min.   :     815   Min.   :  3191   Length:5573        Min.   :1.421e+06  
##  1st Qu.:    5483   1st Qu.:  9058   Class :character   1st Qu.:1.573e+07  
##  Median :   11578   Median : 15870   Mode  :character   Median :2.746e+07  
##  Mean   :   36998   Mean   : 21126                      Mean   :1.043e+08  
##  3rd Qu.:   25085   3rd Qu.: 26155                      3rd Qu.:5.666e+07  
##  Max.   :12038175   Max.   :314638                      Max.   :4.577e+10  
##  NA's   :3          NA's   :3                           NA's   :1492       
##     COMP_TOT            COMP_A            COMP_B            COMP_C        
##  Min.   :     6.0   Min.   :   0.00   Min.   :  0.000   Min.   :    0.00  
##  1st Qu.:    68.0   1st Qu.:   1.00   1st Qu.:  0.000   1st Qu.:    3.00  
##  Median :   162.0   Median :   2.00   Median :  0.000   Median :   11.00  
##  Mean   :   906.8   Mean   :  18.25   Mean   :  1.852   Mean   :   73.44  
##  3rd Qu.:   448.0   3rd Qu.:   8.00   3rd Qu.:  2.000   3rd Qu.:   39.00  
##  Max.   :530446.0   Max.   :1948.00   Max.   :274.000   Max.   :31566.00  
##  NA's   :3          NA's   :3         NA's   :3         NA's   :3         
##      COMP_D             COMP_E            COMP_F             COMP_G        
##  Min.   :  0.0000   Min.   :  0.000   Min.   :    0.00   Min.   :     1.0  
##  1st Qu.:  0.0000   1st Qu.:  0.000   1st Qu.:    1.00   1st Qu.:    32.0  
##  Median :  0.0000   Median :  0.000   Median :    4.00   Median :    74.5  
##  Mean   :  0.4262   Mean   :  2.029   Mean   :   43.26   Mean   :   348.0  
##  3rd Qu.:  0.0000   3rd Qu.:  1.000   3rd Qu.:   15.00   3rd Qu.:   199.0  
##  Max.   :332.0000   Max.   :657.000   Max.   :25222.00   Max.   :150633.0  
##  NA's   :3          NA's   :3         NA's   :3          NA's   :3         
##      COMP_H          COMP_I             COMP_J             COMP_K        
##  Min.   :    0   Min.   :    0.00   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    1   1st Qu.:    2.00   1st Qu.:    0.00   1st Qu.:    0.00  
##  Median :    7   Median :    7.00   Median :    1.00   Median :    0.00  
##  Mean   :   41   Mean   :   55.88   Mean   :   24.74   Mean   :   15.55  
##  3rd Qu.:   25   3rd Qu.:   24.00   3rd Qu.:    5.00   3rd Qu.:    2.00  
##  Max.   :19515   Max.   :29290.00   Max.   :38720.00   Max.   :23738.00  
##  NA's   :3       NA's   :3          NA's   :3          NA's   :3         
##      COMP_L             COMP_M             COMP_N            COMP_O       
##  Min.   :    0.00   Min.   :    0.00   Min.   :    0.0   Min.   :  0.000  
##  1st Qu.:    0.00   1st Qu.:    1.00   1st Qu.:    1.0   1st Qu.:  2.000  
##  Median :    0.00   Median :    4.00   Median :    4.0   Median :  2.000  
##  Mean   :   15.14   Mean   :   51.29   Mean   :   83.7   Mean   :  3.269  
##  3rd Qu.:    3.00   3rd Qu.:   13.00   3rd Qu.:   14.0   3rd Qu.:  3.000  
##  Max.   :14003.00   Max.   :49181.00   Max.   :76757.0   Max.   :204.000  
##  NA's   :3          NA's   :3          NA's   :3         NA's   :3        
##      COMP_P             COMP_Q             COMP_R            COMP_S        
##  Min.   :    0.00   Min.   :    0.00   Min.   :   0.00   Min.   :    0.00  
##  1st Qu.:    2.00   1st Qu.:    1.00   1st Qu.:   0.00   1st Qu.:    5.00  
##  Median :    6.00   Median :    3.00   Median :   2.00   Median :   12.00  
##  Mean   :   30.96   Mean   :   34.15   Mean   :  12.18   Mean   :   51.61  
##  3rd Qu.:   17.00   3rd Qu.:   12.00   3rd Qu.:   6.00   3rd Qu.:   31.00  
##  Max.   :16030.00   Max.   :22248.00   Max.   :6687.00   Max.   :24832.00  
##  NA's   :3          NA's   :3          NA's   :3         NA's   :3         
##      COMP_T      COMP_U              HOTELS            BEDS        
##  Min.   :0   Min.   :  0.00000   Min.   : 1.000   Min.   :    2.0  
##  1st Qu.:0   1st Qu.:  0.00000   1st Qu.: 1.000   1st Qu.:   40.0  
##  Median :0   Median :  0.00000   Median : 1.000   Median :   82.0  
##  Mean   :0   Mean   :  0.05027   Mean   : 3.131   Mean   :  257.5  
##  3rd Qu.:0   3rd Qu.:  0.00000   3rd Qu.: 3.000   3rd Qu.:  200.0  
##  Max.   :0   Max.   :123.00000   Max.   :97.000   Max.   :13247.0  
##  NA's   :3   NA's   :3           NA's   :4686     NA's   :4686     
##   Pr_Agencies        Pu_Agencies         Pr_Bank          Pu_Bank    
##  Min.   :   0.000   Min.   :  0.000   Min.   : 0.000   Min.   :0.00  
##  1st Qu.:   0.000   1st Qu.:  1.000   1st Qu.: 0.000   1st Qu.:1.00  
##  Median :   1.000   Median :  2.000   Median : 1.000   Median :2.00  
##  Mean   :   3.383   Mean   :  2.829   Mean   : 1.312   Mean   :1.58  
##  3rd Qu.:   2.000   3rd Qu.:  2.000   3rd Qu.: 2.000   3rd Qu.:2.00  
##  Max.   :1693.000   Max.   :626.000   Max.   :83.000   Max.   :8.00  
##  NA's   :2231       NA's   :2231      NA's   :2231     NA's   :2231  
##    Pr_Assets           Pu_Assets              Cars          Motorcycles     
##  Min.   :0.000e+00   Min.   :0.000e+00   Min.   :      2   Min.   :      4  
##  1st Qu.:0.000e+00   1st Qu.:4.047e+07   1st Qu.:    602   1st Qu.:    591  
##  Median :3.231e+07   Median :1.339e+08   Median :   1438   Median :   1285  
##  Mean   :9.180e+09   Mean   :6.005e+09   Mean   :   9859   Mean   :   4879  
##  3rd Qu.:1.148e+08   3rd Qu.:4.970e+08   3rd Qu.:   4086   3rd Qu.:   3294  
##  Max.   :1.947e+13   Max.   :8.016e+12   Max.   :5740995   Max.   :1134570  
##  NA's   :2231        NA's   :2231        NA's   :11        NA's   :11       
##  Wheeled_tractor         UBER           MAC             WAL-MART     
##  Min.   :   0.000   Min.   :1      Min.   :  1.000   Min.   : 1.000  
##  1st Qu.:   0.000   1st Qu.:1      1st Qu.:  1.000   1st Qu.: 1.000  
##  Median :   0.000   Median :1      Median :  2.000   Median : 1.000  
##  Mean   :   5.754   Mean   :1      Mean   :  4.277   Mean   : 2.059  
##  3rd Qu.:   1.000   3rd Qu.:1      3rd Qu.:  3.000   3rd Qu.: 1.750  
##  Max.   :3236.000   Max.   :1      Max.   :130.000   Max.   :26.000  
##  NA's   :11         NA's   :5448   NA's   :5407      NA's   :5471    
##   POST_OFFICES    
##  Min.   :  1.000  
##  1st Qu.:  1.000  
##  Median :  1.000  
##  Mean   :  2.081  
##  3rd Qu.:  2.000  
##  Max.   :225.000  
##  NA's   :120

Extensive data cleaning is also required to ensure the data would be useful and regressions can be formulated.

3.1.4 Data Cleaning

3.1.4.1 Observing quality of data

Unfortunately it seems that there are a lot of rows with missing values. In fact almost all of them are missing some values. We will begin to clean the dataset as best we can in order to formulate our desired indicators to test variables which affect GDP per capita growth.

3.1.4.2 Checking for duplicates

which(duplicated(Brazil_cities_raw[,1]))
##   [1]   48   50   51   91  142  143  159  179  207  226  261  270  318  352  370
##  [16]  418  434  484  497  508  517  539  551  563  582  583  591  634  635  644
##  [31]  657  670  671  676  677  678  679  693  703  704  709  715  716  717  730
##  [46]  766  813  851  856  857  877  885  939  957  973 1007 1009 1015 1041 1042
##  [61] 1049 1058 1089 1102 1162 1184 1210 1212 1217 1306 1317 1351 1353 1485 1486
##  [76] 1535 1620 1646 1673 1699 1723 1748 1762 1790 1805 1827 1901 1982 2004 2006
##  [91] 2062 2072 2163 2189 2195 2198 2253 2258 2273 2285 2327 2343 2344 2375 2381
## [106] 2393 2465 2489 2514 2531 2539 2547 2557 2640 2652 2661 2662 2702 2707 2713
## [121] 2724 2744 2935 2992 3053 3062 3082 3135 3151 3182 3213 3216 3217 3245 3251
## [136] 3298 3324 3354 3356 3357 3378 3387 3390 3405 3406 3422 3483 3484 3490 3502
## [151] 3521 3533 3536 3552 3580 3625 3635 3659 3670 3693 3702 3764 3785 3789 3811
## [166] 3813 3845 3868 3880 3881 3882 4003 4008 4015 4019 4025 4027 4031 4040 4073
## [181] 4092 4116 4141 4148 4152 4158 4195 4201 4232 4296 4312 4324 4351 4363 4369
## [196] 4370 4397 4401 4402 4403 4407 4408 4409 4411 4419 4422 4423 4424 4433 4454
## [211] 4473 4482 4488 4489 4490 4499 4538 4590 4611 4617 4618 4619 4620 4643 4644
## [226] 4645 4651 4663 4674 4686 4688 4724 4776 4829 4862 4891 4912 4917 4924 4937
## [241] 4941 5027 5038 5074 5077 5085 5115 5145 5156 5159 5162 5164 5191 5207 5222
## [256] 5226 5258 5302 5305 5306 5340 5346 5425 5435 5439 5450 5457 5471 5472 5473
## [271] 5491 5498 5499 5556
which(duplicated(Brazil_cities_raw[,1:2]))
## integer(0)

With respect to the data, there appears to be a large number of city names repeated. This could cause problems in further joining operations. We will need to create unique identifers by combining them with the STATE column in order to perform any sort of joining.

3.1.4.3 Creating unique identifers for each row

Brazil_cities_uniques <- cbind(CITY_STATE = paste(Brazil_cities_raw$CITY, Brazil_cities_raw$STATE, sep="_"), Brazil_cities_raw)
which(duplicated(Brazil_cities_uniques[,1]))
## integer(0)

3.1.4.4 Removing Columns that are after 2016 data

For the purpose of our analysis, since we’re looking at contributive factors that might lead to the differences in GDP per captial, with reference to the Data_Dictionary, we will be removing variables which come after 2016

NOTE: This is important because we would be making a logical fallacy if we try to build explainatory models on factors which happen post-event which may draw reverse causation. This would not affect things such as Area as those would stay constant regardless of time differences. Additionally, we will still have enough variables and derived variables to perform our analysis.

We will also be removing MUN_EXPENDITURE because of the large amounts of missing data points and our inability to properly estimate these values from external sources. Because this specific column has much larger amounts of missing rows, it would be ill-advised to remove rows rather than the entire column itself.

Lastly we will also remove COMP_T as there is no data values there at all

drops <- c("IBGE_PLANTED_AREA","IBGE_CROP_PRODUCTION_$", "PAY_TV", "FIXED_PHONES", "ESTIMATED_POP", "REGIAO_TUR", "CATEGORIA_TUR", "HOTELS", "BEDS", "Pr_Agencies", "Pu_Agencies", "Pr_Bank", "Pu_Bank", "Pr_Assets", "Pu_Assets", "Cars", "Motorcycles", "Wheeled_tractor", "UBER", "MAC", "WAL-MART", "POST_OFFICES", "MUN_EXPENDIT", "COMP_T")

Brazil_cities_2016 <- Brazil_cities_uniques[ , !(names(Brazil_cities_uniques) %in% drops)]

3.1.4.5 Looking for missing depedent variable

If the dependant variable is missing in our data, that specific city will unforunately not be able utilized in our analysis.

Missing_GDP_PC <- Brazil_cities_2016[(is.na(Brazil_cities_2016$GDP_CAPITA))!=0,]
Missing_GDP_PC
##              CITY_STATE            CITY STATE CAPITAL IBGE_RES_POP
## 2702 Lagoa Dos Patos_RS Lagoa Dos Patos    RS       0           NA
## 4482 Santa Teresinha_BA Santa Teresinha    BA       0           NA
## 4606     São Caetano_PE     São Caetano    PE       0           NA
##      IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL
## 2702                NA                NA      NA            NA            NA
## 4482                NA                NA      NA            NA            NA
## 4606                NA                NA      NA            NA            NA
##      IBGE_POP IBGE_1 IBGE_1-4 IBGE_5-9 IBGE_10-14 IBGE_15-59 IBGE_60+
## 2702       NA     NA       NA       NA         NA         NA       NA
## 4482       NA     NA       NA       NA         NA         NA       NA
## 4606       NA     NA       NA       NA         NA         NA       NA
##      IDHM Ranking 2010 IDHM IDHM_Renda IDHM_Longevidade IDHM_Educacao      LONG
## 2702                NA   NA         NA               NA            NA        NA
## 4482              4493 0.59      0.549            0.804         0.459 -39.52114
## 4606                NA   NA         NA               NA            NA        NA
##            LAT    ALT     AREA RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY
## 2702        NA     NA 10158.75        <NA>          NA           NA
## 4482 -12.77285 222.51       NA        <NA>          NA           NA
## 4606        NA     NA       NA        <NA>          NA           NA
##      GVA_SERVICES GVA_PUBLIC  GVA_TOTAL  TAXES GDP POP_GDP GDP_CAPITA GVA_MAIN
## 2702           NA         NA          NA    NA  NA      NA         NA     <NA>
## 4482           NA         NA          NA    NA  NA      NA         NA     <NA>
## 4606           NA         NA          NA    NA  NA      NA         NA     <NA>
##      COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G COMP_H COMP_I
## 2702       NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
## 4482       NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
## 4606       NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
##      COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R COMP_S
## 2702     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
## 4482     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
## 4606     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
##      COMP_U
## 2702     NA
## 4482     NA
## 4606     NA

According to Wikipedia, the number of municipalities in Brazil should amount to 5,573. However, our dataset includes 5,573. Which means that the 3 cities with missing GDPC are probably not accoutned for in some way. We will then remove the observed cities assuming they are irrelevant to our study. Source: https://en.wikipedia.org/wiki/Municipalities_of_Brazil

Brazil_cities_allGDPC <- Brazil_cities_2016[(is.na(Brazil_cities_2016$GDP_CAPITA))==0,]
summary((Brazil_cities_allGDPC))
##                   CITY_STATE       CITY              STATE          
##  Abadia De Goiás_GO    :   1   Length:5570        Length:5570       
##  Abadia Dos Dourados_MG:   1   Class :character   Class :character  
##  Abadiânia_GO          :   1   Mode  :character   Mode  :character  
##  Abaeté_MG             :   1                                        
##  Abaetetuba_PA         :   1                                        
##  Abaiara_CE            :   1                                        
##  (Other)               :5564                                        
##     CAPITAL          IBGE_RES_POP      IBGE_RES_POP_BRAS  IBGE_RES_POP_ESTR 
##  Min.   :0.000000   Min.   :     805   Min.   :     805   Min.   :     0.0  
##  1st Qu.:0.000000   1st Qu.:    5235   1st Qu.:    5230   1st Qu.:     0.0  
##  Median :0.000000   Median :   10934   Median :   10926   Median :     0.0  
##  Mean   :0.004847   Mean   :   34278   Mean   :   34200   Mean   :    77.5  
##  3rd Qu.:0.000000   3rd Qu.:   23424   3rd Qu.:   23390   3rd Qu.:    10.0  
##  Max.   :1.000000   Max.   :11253503   Max.   :11133776   Max.   :119727.0  
##                     NA's   :5          NA's   :5          NA's   :5         
##     IBGE_DU        IBGE_DU_URBAN     IBGE_DU_RURAL      IBGE_POP       
##  Min.   :    239   Min.   :     60   Min.   :    3   Min.   :     174  
##  1st Qu.:   1572   1st Qu.:    874   1st Qu.:  487   1st Qu.:    2801  
##  Median :   3174   Median :   1846   Median :  931   Median :    6170  
##  Mean   :  10303   Mean   :   8859   Mean   : 1463   Mean   :   27595  
##  3rd Qu.:   6726   3rd Qu.:   4624   3rd Qu.: 1832   3rd Qu.:   15302  
##  Max.   :3576148   Max.   :3548433   Max.   :33809   Max.   :10463636  
##  NA's   :7         NA's   :7         NA's   :78      NA's   :5         
##      IBGE_1            IBGE_1-4         IBGE_5-9        IBGE_10-14    
##  Min.   :     0.0   Min.   :     5   Min.   :     7   Min.   :    12  
##  1st Qu.:    38.0   1st Qu.:   158   1st Qu.:   220   1st Qu.:   259  
##  Median :    92.0   Median :   376   Median :   516   Median :   588  
##  Mean   :   383.3   Mean   :  1544   Mean   :  2069   Mean   :  2381  
##  3rd Qu.:   232.0   3rd Qu.:   951   3rd Qu.:  1300   3rd Qu.:  1478  
##  Max.   :129464.0   Max.   :514794   Max.   :684443   Max.   :783702  
##  NA's   :5          NA's   :5        NA's   :5        NA's   :5       
##    IBGE_15-59         IBGE_60+       IDHM Ranking 2010      IDHM       
##  Min.   :     94   Min.   :     29   Min.   :   1      Min.   :0.4180  
##  1st Qu.:   1734   1st Qu.:    341   1st Qu.:1392      1st Qu.:0.5990  
##  Median :   3841   Median :    722   Median :2782      Median :0.6650  
##  Mean   :  18212   Mean   :   3004   Mean   :2783      Mean   :0.6592  
##  3rd Qu.:   9628   3rd Qu.:   1724   3rd Qu.:4173      3rd Qu.:0.7180  
##  Max.   :7058221   Max.   :1293012   Max.   :5565      Max.   :0.8620  
##  NA's   :5         NA's   :5         NA's   :6         NA's   :6       
##    IDHM_Renda     IDHM_Longevidade IDHM_Educacao         LONG       
##  Min.   :0.4000   Min.   :0.6720   Min.   :0.2070   Min.   :-72.92  
##  1st Qu.:0.5720   1st Qu.:0.7690   1st Qu.:0.4900   1st Qu.:-50.87  
##  Median :0.6540   Median :0.8080   Median :0.5600   Median :-46.52  
##  Mean   :0.6429   Mean   :0.8016   Mean   :0.5591   Mean   :-46.23  
##  3rd Qu.:0.7070   3rd Qu.:0.8360   3rd Qu.:0.6310   3rd Qu.:-41.41  
##  Max.   :0.8910   Max.   :0.8940   Max.   :0.8250   Max.   :-32.44  
##  NA's   :6        NA's   :6        NA's   :6        NA's   :7       
##       LAT               ALT                AREA           RURAL_URBAN       
##  Min.   :-33.688   Min.   :     0.0   Min.   :     3.57   Length:5570       
##  1st Qu.:-22.838   1st Qu.:   169.7   1st Qu.:   204.43   Class :character  
##  Median :-18.090   Median :   406.5   Median :   415.92   Mode  :character  
##  Mean   :-16.445   Mean   :   894.0   Mean   :  1515.89                     
##  3rd Qu.: -8.489   3rd Qu.:   629.0   3rd Qu.:  1026.38                     
##  Max.   :  4.585   Max.   :874579.0   Max.   :159533.33                     
##  NA's   :7         NA's   :7          NA's   :1                             
##   GVA_AGROPEC       GVA_INDUSTRY       GVA_SERVICES         GVA_PUBLIC      
##  Min.   :      0   Min.   :       1   Min.   :        2   Min.   :       7  
##  1st Qu.:   4189   1st Qu.:    1726   1st Qu.:    10112   1st Qu.:   17267  
##  Median :  20426   Median :    7424   Median :    31211   Median :   35866  
##  Mean   :  47271   Mean   :  175928   Mean   :   489451   Mean   :  123768  
##  3rd Qu.:  51227   3rd Qu.:   41022   3rd Qu.:   115406   3rd Qu.:   89245  
##  Max.   :1402282   Max.   :63306755   Max.   :464656988   Max.   :41902893  
##                                                                             
##    GVA_TOTAL             TAXES                GDP               POP_GDP        
##  Min.   :       17   Min.   :   -14159   Min.   :       15   Min.   :     815  
##  1st Qu.:    42253   1st Qu.:     1305   1st Qu.:    43709   1st Qu.:    5483  
##  Median :   119492   Median :     5100   Median :   125153   Median :   11578  
##  Mean   :   832987   Mean   :   118864   Mean   :   954584   Mean   :   36998  
##  3rd Qu.:   313963   3rd Qu.:    22197   3rd Qu.:   329539   3rd Qu.:   25085  
##  Max.   :569910503   Max.   :117125387   Max.   :687035890   Max.   :12038175  
##                                                                                
##    GDP_CAPITA       GVA_MAIN            COMP_TOT            COMP_A       
##  Min.   :  3191   Length:5570        Min.   :     6.0   Min.   :   0.00  
##  1st Qu.:  9058   Class :character   1st Qu.:    68.0   1st Qu.:   1.00  
##  Median : 15870   Mode  :character   Median :   162.0   Median :   2.00  
##  Mean   : 21126                      Mean   :   906.8   Mean   :  18.25  
##  3rd Qu.: 26155                      3rd Qu.:   448.0   3rd Qu.:   8.00  
##  Max.   :314638                      Max.   :530446.0   Max.   :1948.00  
##                                                                          
##      COMP_B            COMP_C             COMP_D             COMP_E       
##  Min.   :  0.000   Min.   :    0.00   Min.   :  0.0000   Min.   :  0.000  
##  1st Qu.:  0.000   1st Qu.:    3.00   1st Qu.:  0.0000   1st Qu.:  0.000  
##  Median :  0.000   Median :   11.00   Median :  0.0000   Median :  0.000  
##  Mean   :  1.852   Mean   :   73.44   Mean   :  0.4262   Mean   :  2.029  
##  3rd Qu.:  2.000   3rd Qu.:   39.00   3rd Qu.:  0.0000   3rd Qu.:  1.000  
##  Max.   :274.000   Max.   :31566.00   Max.   :332.0000   Max.   :657.000  
##                                                                           
##      COMP_F             COMP_G             COMP_H          COMP_I        
##  Min.   :    0.00   Min.   :     1.0   Min.   :    0   Min.   :    0.00  
##  1st Qu.:    1.00   1st Qu.:    32.0   1st Qu.:    1   1st Qu.:    2.00  
##  Median :    4.00   Median :    74.5   Median :    7   Median :    7.00  
##  Mean   :   43.26   Mean   :   348.0   Mean   :   41   Mean   :   55.88  
##  3rd Qu.:   15.00   3rd Qu.:   199.0   3rd Qu.:   25   3rd Qu.:   24.00  
##  Max.   :25222.00   Max.   :150633.0   Max.   :19515   Max.   :29290.00  
##                                                                          
##      COMP_J             COMP_K             COMP_L             COMP_M        
##  Min.   :    0.00   Min.   :    0.00   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    0.00   1st Qu.:    0.00   1st Qu.:    0.00   1st Qu.:    1.00  
##  Median :    1.00   Median :    0.00   Median :    0.00   Median :    4.00  
##  Mean   :   24.74   Mean   :   15.55   Mean   :   15.14   Mean   :   51.29  
##  3rd Qu.:    5.00   3rd Qu.:    2.00   3rd Qu.:    3.00   3rd Qu.:   13.00  
##  Max.   :38720.00   Max.   :23738.00   Max.   :14003.00   Max.   :49181.00  
##                                                                             
##      COMP_N            COMP_O            COMP_P             COMP_Q        
##  Min.   :    0.0   Min.   :  0.000   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    1.0   1st Qu.:  2.000   1st Qu.:    2.00   1st Qu.:    1.00  
##  Median :    4.0   Median :  2.000   Median :    6.00   Median :    3.00  
##  Mean   :   83.7   Mean   :  3.269   Mean   :   30.96   Mean   :   34.15  
##  3rd Qu.:   14.0   3rd Qu.:  3.000   3rd Qu.:   17.00   3rd Qu.:   12.00  
##  Max.   :76757.0   Max.   :204.000   Max.   :16030.00   Max.   :22248.00  
##                                                                           
##      COMP_R            COMP_S             COMP_U         
##  Min.   :   0.00   Min.   :    0.00   Min.   :  0.00000  
##  1st Qu.:   0.00   1st Qu.:    5.00   1st Qu.:  0.00000  
##  Median :   2.00   Median :   12.00   Median :  0.00000  
##  Mean   :  12.18   Mean   :   51.61   Mean   :  0.05027  
##  3rd Qu.:   6.00   3rd Qu.:   31.00   3rd Qu.:  0.00000  
##  Max.   :6687.00   Max.   :24832.00   Max.   :123.00000  
## 

3.1.4.6 Checking places with missing Residential Population Data.

Brazil_cities_allGDPC[(is.na(Brazil_cities_allGDPC$IBGE_RES_POP_ESTR))!=0,]
##                CITY_STATE              CITY STATE CAPITAL IBGE_RES_POP
## 472   Balneário Rincão_SC  Balneário Rincão    SC       0           NA
## 3117  Mojuí Dos Campos_PA  Mojuí Dos Campos    PA       0           NA
## 3581 Paraíso Das Águas_MS Paraíso Das Águas    MS       0           NA
## 3761    Pescaria Brava_SC    Pescaria Brava    SC       0           NA
## 3821    Pinto Bandeira_RS    Pinto Bandeira    RS       0           NA
##      IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL
## 472                 NA                NA      NA            NA            NA
## 3117                NA                NA      NA            NA            NA
## 3581                NA                NA      NA            NA            NA
## 3761                NA                NA      NA            NA            NA
## 3821                NA                NA      NA            NA            NA
##      IBGE_POP IBGE_1 IBGE_1-4 IBGE_5-9 IBGE_10-14 IBGE_15-59 IBGE_60+
## 472        NA     NA       NA       NA         NA         NA       NA
## 3117       NA     NA       NA       NA         NA         NA       NA
## 3581       NA     NA       NA       NA         NA         NA       NA
## 3761       NA     NA       NA       NA         NA         NA       NA
## 3821       NA     NA       NA       NA         NA         NA       NA
##      IDHM Ranking 2010 IDHM IDHM_Renda IDHM_Longevidade IDHM_Educacao LONG LAT
## 472                 NA   NA         NA               NA            NA   NA  NA
## 3117                NA   NA         NA               NA            NA   NA  NA
## 3581                NA   NA         NA               NA            NA   NA  NA
## 3761                NA   NA         NA               NA            NA   NA  NA
## 3821                NA   NA         NA               NA            NA   NA  NA
##      ALT    AREA       RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES
## 472   NA   63.43 Sem classificação     2045.03     51257.53     96248.50
## 3117  NA 4988.24 Sem classificação    42123.35         7.20     28168.56
## 3581  NA 5061.43 Sem classificação   210844.60    146514.00     68393.39
## 3761  NA  106.85 Sem classificação     3167.11      5812.35        29.46
## 3821  NA  104.86 Sem classificação    19067.89      4366.36      9652.04
##      GVA_PUBLIC  GVA_TOTAL     TAXES       GDP POP_GDP GDP_CAPITA
## 472    52820.64   202371.69 14863.05 217234.75   12212   17788.63
## 3117   55645.41   133135.10  4177.94 137313.05   15548    8831.56
## 3581   36606.37   462358.36 21594.41 483952.77    5251   92163.92
## 3761   39700.00       78.14  4505.77  82645.86    9908    8341.33
## 3821   14620.12       47.71  4064.74  51771.14    2847   18184.45
##                                                                  GVA_MAIN
## 472                                                       Demais serviços
## 3117 Administração, defesa, educação e saúde públicas e seguridade social
## 3581          Agricultura, inclusive apoio à agricultura e a pós colheita
## 3761 Administração, defesa, educação e saúde públicas e seguridade social
## 3821          Agricultura, inclusive apoio à agricultura e a pós colheita
##      COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G COMP_H COMP_I
## 472       270      1      1     16      0      2     47    112      8     13
## 3117       78      0      0      3      0      0      2     14      6      0
## 3581      129      5      1      0      1      2      9     57     21      7
## 3761      105      1      1     22      0      2      6     36      7      3
## 3821       63      1      0     12      0      0      4     18      7      5
##      COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R COMP_S
## 472       3      6     11     10     23      2      3      6      1      5
## 3117      0      0      0      2      2      0     41      2      0      6
## 3581      1      0      0      4      9      2      3      2      0      5
## 3761      1      0      1      1      1      2     14      0      1      6
## 3821      0      0      2      2      2      1      1      1      3      4
##      COMP_U
## 472       0
## 3117      0
## 3581      0
## 3761      0
## 3821      0

3.1.4.7 Removing un-usable data rows.

Due to the large amount of missing data from these cities, we will be removing them as we would be unable to properly estimate the population at these specific dates unless the data is provided to us. Additionally, as they are only 5 cities, we can still utilize the remaining 5565 for the purposes of our analysis which is more than sufficient.

Brazil_cities_allpop <- Brazil_cities_allGDPC[(is.na(Brazil_cities_allGDPC$IBGE_RES_POP_ESTR))==0,]
summary(Brazil_cities_allpop)
##                   CITY_STATE       CITY              STATE          
##  Abadia De Goiás_GO    :   1   Length:5565        Length:5565       
##  Abadia Dos Dourados_MG:   1   Class :character   Class :character  
##  Abadiânia_GO          :   1   Mode  :character   Mode  :character  
##  Abaeté_MG             :   1                                        
##  Abaetetuba_PA         :   1                                        
##  Abaiara_CE            :   1                                        
##  (Other)               :5559                                        
##     CAPITAL          IBGE_RES_POP      IBGE_RES_POP_BRAS  IBGE_RES_POP_ESTR 
##  Min.   :0.000000   Min.   :     805   Min.   :     805   Min.   :     0.0  
##  1st Qu.:0.000000   1st Qu.:    5235   1st Qu.:    5230   1st Qu.:     0.0  
##  Median :0.000000   Median :   10934   Median :   10926   Median :     0.0  
##  Mean   :0.004852   Mean   :   34278   Mean   :   34200   Mean   :    77.5  
##  3rd Qu.:0.000000   3rd Qu.:   23424   3rd Qu.:   23390   3rd Qu.:    10.0  
##  Max.   :1.000000   Max.   :11253503   Max.   :11133776   Max.   :119727.0  
##                                                                             
##     IBGE_DU        IBGE_DU_URBAN     IBGE_DU_RURAL      IBGE_POP       
##  Min.   :    239   Min.   :     60   Min.   :    3   Min.   :     174  
##  1st Qu.:   1572   1st Qu.:    874   1st Qu.:  487   1st Qu.:    2801  
##  Median :   3174   Median :   1846   Median :  931   Median :    6170  
##  Mean   :  10303   Mean   :   8859   Mean   : 1463   Mean   :   27595  
##  3rd Qu.:   6726   3rd Qu.:   4624   3rd Qu.: 1832   3rd Qu.:   15302  
##  Max.   :3576148   Max.   :3548433   Max.   :33809   Max.   :10463636  
##  NA's   :2         NA's   :2         NA's   :73                        
##      IBGE_1            IBGE_1-4         IBGE_5-9        IBGE_10-14    
##  Min.   :     0.0   Min.   :     5   Min.   :     7   Min.   :    12  
##  1st Qu.:    38.0   1st Qu.:   158   1st Qu.:   220   1st Qu.:   259  
##  Median :    92.0   Median :   376   Median :   516   Median :   588  
##  Mean   :   383.3   Mean   :  1544   Mean   :  2069   Mean   :  2381  
##  3rd Qu.:   232.0   3rd Qu.:   951   3rd Qu.:  1300   3rd Qu.:  1478  
##  Max.   :129464.0   Max.   :514794   Max.   :684443   Max.   :783702  
##                                                                       
##    IBGE_15-59         IBGE_60+       IDHM Ranking 2010      IDHM       
##  Min.   :     94   Min.   :     29   Min.   :   1      Min.   :0.4180  
##  1st Qu.:   1734   1st Qu.:    341   1st Qu.:1392      1st Qu.:0.5990  
##  Median :   3841   Median :    722   Median :2782      Median :0.6650  
##  Mean   :  18212   Mean   :   3004   Mean   :2783      Mean   :0.6592  
##  3rd Qu.:   9628   3rd Qu.:   1724   3rd Qu.:4173      3rd Qu.:0.7180  
##  Max.   :7058221   Max.   :1293012   Max.   :5565      Max.   :0.8620  
##                                      NA's   :1         NA's   :1       
##    IDHM_Renda     IDHM_Longevidade IDHM_Educacao         LONG       
##  Min.   :0.4000   Min.   :0.6720   Min.   :0.2070   Min.   :-72.92  
##  1st Qu.:0.5720   1st Qu.:0.7690   1st Qu.:0.4900   1st Qu.:-50.87  
##  Median :0.6540   Median :0.8080   Median :0.5600   Median :-46.52  
##  Mean   :0.6429   Mean   :0.8016   Mean   :0.5591   Mean   :-46.23  
##  3rd Qu.:0.7070   3rd Qu.:0.8360   3rd Qu.:0.6310   3rd Qu.:-41.41  
##  Max.   :0.8910   Max.   :0.8940   Max.   :0.8250   Max.   :-32.44  
##  NA's   :1        NA's   :1        NA's   :1        NA's   :2       
##       LAT               ALT                AREA           RURAL_URBAN       
##  Min.   :-33.688   Min.   :     0.0   Min.   :     3.57   Length:5565       
##  1st Qu.:-22.838   1st Qu.:   169.7   1st Qu.:   204.53   Class :character  
##  Median :-18.090   Median :   406.5   Median :   416.59   Mode  :character  
##  Mean   :-16.445   Mean   :   894.0   Mean   :  1515.39                     
##  3rd Qu.: -8.489   3rd Qu.:   629.0   3rd Qu.:  1025.73                     
##  Max.   :  4.585   Max.   :874579.0   Max.   :159533.33                     
##  NA's   :2         NA's   :2          NA's   :1                             
##   GVA_AGROPEC       GVA_INDUSTRY       GVA_SERVICES         GVA_PUBLIC      
##  Min.   :      0   Min.   :       1   Min.   :        2   Min.   :       7  
##  1st Qu.:   4193   1st Qu.:    1725   1st Qu.:    10113   1st Qu.:   17260  
##  Median :  20430   Median :    7425   Median :    31212   Median :   35809  
##  Mean   :  47263   Mean   :  176049   Mean   :   489855   Mean   :  123844  
##  3rd Qu.:  51238   3rd Qu.:   41011   3rd Qu.:   115521   3rd Qu.:   89316  
##  Max.   :1402282   Max.   :63306755   Max.   :464656988   Max.   :41902893  
##                                                                             
##    GVA_TOTAL             TAXES                GDP               POP_GDP        
##  Min.   :       17   Min.   :   -14159   Min.   :       15   Min.   :     815  
##  1st Qu.:    42254   1st Qu.:     1303   1st Qu.:    43706   1st Qu.:    5488  
##  Median :   119481   Median :     5107   Median :   125111   Median :   11584  
##  Mean   :   833592   Mean   :   118962   Mean   :   955266   Mean   :   37023  
##  3rd Qu.:   313988   3rd Qu.:    22209   3rd Qu.:   329717   3rd Qu.:   25102  
##  Max.   :569910503   Max.   :117125387   Max.   :687035890   Max.   :12038175  
##                                                                                
##    GDP_CAPITA       GVA_MAIN            COMP_TOT            COMP_A       
##  Min.   :  3191   Length:5565        Min.   :     6.0   Min.   :   0.00  
##  1st Qu.:  9062   Class :character   1st Qu.:    68.0   1st Qu.:   1.00  
##  Median : 15866   Mode  :character   Median :   162.0   Median :   2.00  
##  Mean   : 21119                      Mean   :   907.5   Mean   :  18.27  
##  3rd Qu.: 26155                      3rd Qu.:   449.0   3rd Qu.:   8.00  
##  Max.   :314638                      Max.   :530446.0   Max.   :1948.00  
##                                                                          
##      COMP_B            COMP_C            COMP_D             COMP_E      
##  Min.   :  0.000   Min.   :    0.0   Min.   :  0.0000   Min.   :  0.00  
##  1st Qu.:  0.000   1st Qu.:    3.0   1st Qu.:  0.0000   1st Qu.:  0.00  
##  Median :  0.000   Median :   11.0   Median :  0.0000   Median :  0.00  
##  Mean   :  1.853   Mean   :   73.5   Mean   :  0.4264   Mean   :  2.03  
##  3rd Qu.:  2.000   3rd Qu.:   39.0   3rd Qu.:  0.0000   3rd Qu.:  1.00  
##  Max.   :274.000   Max.   :31566.0   Max.   :332.0000   Max.   :657.00  
##                                                                         
##      COMP_F             COMP_G             COMP_H             COMP_I        
##  Min.   :    0.00   Min.   :     1.0   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    1.00   1st Qu.:    32.0   1st Qu.:    1.00   1st Qu.:    2.00  
##  Median :    4.00   Median :    75.0   Median :    7.00   Median :    7.00  
##  Mean   :   43.29   Mean   :   348.2   Mean   :   41.02   Mean   :   55.92  
##  3rd Qu.:   15.00   3rd Qu.:   200.0   3rd Qu.:   25.00   3rd Qu.:   24.00  
##  Max.   :25222.00   Max.   :150633.0   Max.   :19515.00   Max.   :29290.00  
##                                                                             
##      COMP_J             COMP_K             COMP_L             COMP_M        
##  Min.   :    0.00   Min.   :    0.00   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    0.00   1st Qu.:    0.00   1st Qu.:    0.00   1st Qu.:    1.00  
##  Median :    1.00   Median :    0.00   Median :    0.00   Median :    4.00  
##  Mean   :   24.76   Mean   :   15.56   Mean   :   15.15   Mean   :   51.34  
##  3rd Qu.:    5.00   3rd Qu.:    2.00   3rd Qu.:    3.00   3rd Qu.:   13.00  
##  Max.   :38720.00   Max.   :23738.00   Max.   :14003.00   Max.   :49181.00  
##                                                                             
##      COMP_N             COMP_O            COMP_P             COMP_Q        
##  Min.   :    0.00   Min.   :  1.000   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    1.00   1st Qu.:  2.000   1st Qu.:    2.00   1st Qu.:    1.00  
##  Median :    4.00   Median :  2.000   Median :    6.00   Median :    3.00  
##  Mean   :   83.77   Mean   :  3.271   Mean   :   30.98   Mean   :   34.18  
##  3rd Qu.:   14.00   3rd Qu.:  3.000   3rd Qu.:   17.00   3rd Qu.:   12.00  
##  Max.   :76757.00   Max.   :204.000   Max.   :16030.00   Max.   :22248.00  
##                                                                            
##      COMP_R            COMP_S             COMP_U         
##  Min.   :   0.00   Min.   :    0.00   Min.   :  0.00000  
##  1st Qu.:   0.00   1st Qu.:    5.00   1st Qu.:  0.00000  
##  Median :   2.00   Median :   12.00   Median :  0.00000  
##  Mean   :  12.19   Mean   :   51.65   Mean   :  0.05031  
##  3rd Qu.:   6.00   3rd Qu.:   31.00   3rd Qu.:  0.00000  
##  Max.   :6687.00   Max.   :24832.00   Max.   :123.00000  
## 

3.1.4.8 Cleaning IBGE_DU_RURAL values

Brazil_cities_allpop[(is.na(Brazil_cities_allpop$IBGE_DU_RURAL))!=0,]
##                     CITY_STATE                   CITY STATE CAPITAL
## 70       Águas De São Pedro_SP     Águas De São Pedro    SP       0
## 178                Alvorada_RS               Alvorada    RS       0
## 295                 Aracaju_SE                Aracaju    SE       1
## 296            Araçariguama_SP           Araçariguama    SP       0
## 385      Armação Dos Búzios_RJ     Armação Dos Búzios    RJ       0
## 392         Arraial Do Cabo_RJ        Arraial Do Cabo    RJ       0
## 455         Baía Da Traição_PB        Baía Da Traição    PB       0
## 468      Balneário Camboriú_SC     Balneário Camboriú    SC       0
## 559                 Barueri_SP                Barueri    SP       0
## 588            Belford Roxo_RJ           Belford Roxo    RJ       0
## 593          Belo Horizonte_MG         Belo Horizonte    MG       1
## 707               Bombinhas_SC              Bombinhas    SC       0
## 826                Cabedelo_PB               Cabedelo    PB       0
## 855            Cachoeirinha_RS           Cachoeirinha    RS       0
## 923              Camaragibe_PE             Camaragibe    PE       0
## 978    Campo Limpo Paulista_SP   Campo Limpo Paulista    SP       0
## 1036                 Canoas_RS                 Canoas    RS       0
## 1098            Carapicuíba_SP            Carapicuíba    SP       0
## 1357                Confins_MG                Confins    MG       0
## 1437                  Cotia_SP                  Cotia    SP       0
## 1490                Cubatão_SP                Cubatão    SP       0
## 1509               Curitiba_PR               Curitiba    PR       1
## 1552                Diadema_SP                Diadema    SP       0
## 1657         Embu Das Artes_SP         Embu Das Artes    SP       0
## 1731                Eusébio_CE                Eusébio    CE       0
## 1774    Fernando De Noronha_PE    Fernando De Noronha    PE       0
## 1832              Fortaleza_CE              Fortaleza    CE       1
## 2037              Guarulhos_SP              Guarulhos    SP       0
## 2068            Hortolândia_SP            Hortolândia    SP       0
## 2155          Iguaba Grande_RJ          Iguaba Grande    RJ       0
## 2166          Ilha Comprida_SP          Ilha Comprida    SP       0
## 2181               Imbituba_SC               Imbituba    SC       0
## 2360              Itaparica_BA              Itaparica    BA       0
## 2376                Itapevi_SP                Itapevi    SP       0
## 2401        Itaquaquecetuba_SP        Itaquaquecetuba    SP       0
## 2515                Jandira_SP                Jandira    SP       0
## 2524                 Japeri_RJ                 Japeri    RJ       0
## 2587             Joanópolis_SP             Joanópolis    SP       0
## 2752       Lauro De Freitas_BA       Lauro De Freitas    BA       0
## 2783                Lindóia_SP                Lindóia    SP       0
## 2937               Marcação_PB               Marcação    PB       0
## 3028                   Mauá_SP                   Mauá    SP       0
## 3052               Mesquita_RJ               Mesquita    RJ       0
## 3241                  Natal_RN                  Natal    RN       1
## 3267              Nilópolis_RJ              Nilópolis    RJ       0
## 3274                Niterói_RJ                Niterói    RJ       0
## 3470                 Osasco_SP                 Osasco    SP       0
## 3500              Pacaraima_RR              Pacaraima    RR       0
## 3624             Parnamirim_RN             Parnamirim    RN       0
## 3669               Paulista_PE               Paulista    PE       0
## 3804                Pinhais_PR                Pinhais    PR       0
## 3828               Piracaia_SP               Piracaia    SP       0
## 3850  Pirapora Do Bom Jesus_SP  Pirapora Do Bom Jesus    SP       0
## 3949           Porto Alegre_RS           Porto Alegre    RS       1
## 4002           Praia Grande_SP           Praia Grande    SP       0
## 4074              Queimados_RJ              Queimados    RJ       0
## 4112                 Recife_PE                 Recife    PE       1
## 4179         Ribeirão Pires_SP         Ribeirão Pires    SP       0
## 4209         Rio De Janeiro_RJ         Rio De Janeiro    RJ       1
## 4225    Rio Grande Da Serra_SP    Rio Grande Da Serra    SP       0
## 4378    Santa Cruz De Minas_MG    Santa Cruz De Minas    MG       0
## 4505    Santana De Parnaíba_SP    Santana De Parnaíba    SP       0
## 4537            Santo André_SP            Santo André    SP       0
## 4608     São Caetano Do Sul_SP     São Caetano Do Sul    SP       0
## 4707     São João De Meriti_RJ     São João De Meriti    RJ       0
## 4807           São Lourenço_MG           São Lourenço    MG       0
## 5117        Taboão Da Serra_SP        Taboão Da Serra    SP       0
## 5367               Uiramutã_RR               Uiramutã    RR       0
## 5431    Valparaíso De Goiás_GO    Valparaíso De Goiás    GO       0
## 5443 Vargem Grande Paulista_SP Vargem Grande Paulista    SP       0
## 5459        Várzea Paulista_SP        Várzea Paulista    SP       0
## 5486             Vespasiano_MG             Vespasiano    MG       0
## 5537                Vitória_ES                Vitória    ES       1
##      IBGE_RES_POP IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN
## 70           2707              2693                14     990           990
## 178        195673            195483               190   60221         60221
## 295        571149            570674               475  169830        169830
## 296         17080             16964               116    4940          4940
## 385         27560             27073               487    9030          9030
## 392         27715             27655                60    8940          8940
## 455          8012              8005                 7     875           875
## 468        108089            107010              1079   39333         39333
## 559        240749            239837               912   71821         71821
## 588        469332            468931               401  145726        145726
## 593       2375151           2369063              6088  762924        762924
## 707         14293             14140               153    4627          4627
## 826         57944             57913                31   17180         17180
## 855        118278            118170               108   38888         38888
## 923        144466            144374                92   42291         42291
## 978         74074             73873               201   22030         22030
## 1036       323827            323280               547  103963        103963
## 1098       369584            368853               731  108676        108676
## 1357         5936              5930                 6    1696          1696
## 1437       201150            199966              1184   59017         59017
## 1490       118720            118537               183   36417         36417
## 1509      1751907           1743036              8871  576347        576347
## 1552       386089            385274               815  117397        117397
## 1657       240230            239773               457   68210         68210
## 1731        46033             45936                97   12713         12713
## 1774         2630              2630                 0     590           590
## 1832      2452185           2449109              3076  711478        711478
## 2037      1221979           1216222              5757  360748        360748
## 2068       192692            192329               363   55430         55430
## 2155        22851             22729               122    7582          7582
## 2166         9025              8979                46    3125          3125
## 2181        40170             40046               124   13186         13186
## 2360        20725             20685                40    6364          6364
## 2376       200769            200557               212   57634         57634
## 2401       321770            321071               699   89751         89751
## 2515       108344            108086               258   32545         32545
## 2524        95492             95424                68   28424         28424
## 2587        11768             11715                53    3895          3895
## 2752       163449            162510               939   49533         49533
## 2783         6712              6699                13    2221          2221
## 2937         7609              7609                 0      NA            NA
## 3028       417064            416393               671  125423        125423
## 3052       168376            168020               356   53117         53117
## 3241       803739            802686              1053  235720        235720
## 3267       157425            156947               478   50521         50521
## 3274       487562            483821              3741  169306        169306
## 3470       666740            664447              2293  202009        202009
## 3500        10433             10381                52     354           354
## 3624       202456            202099               357   60388         60388
## 3669       300466            300293               173   90683         90683
## 3804       117008            116814               194   35572         35572
## 3828        25116             25045                71    7825          7825
## 3850        15733             15716                17    4389          4389
## 3949      1409351           1403450              5901  508503        508503
## 4002       262051            260454              1597   83597         83597
## 4074       137962            137864                98   42248         42248
## 4112      1537704           1535289              2415  471252        471252
## 4179       113068            112613               455   33819         33819
## 4209      6320446           6264915             55531 2147235       2147235
## 4225        43974             43936                38   13207         13207
## 4378         7865              7862                 3    2520          2520
## 4505       108813            107879               934   31630         31630
## 4537       676407            672359              4048  216343        216343
## 4608       149263            147306              1957   50518         50518
## 4707       458673            457807               866  147516        147516
## 4807        41657             41405               252   13662         13662
## 5117       244528            243903               625   72337         72337
## 5367         8375              8375                 0      NA            NA
## 5431       132982            132857               125   39440         39440
## 5443        42997             42795               202   12545         12545
## 5459       107089            107047                42   31607         31607
## 5486       104527            104479                48   29820         29820
## 5537       327801            326735              1066  108502        108502
##      IBGE_DU_RURAL IBGE_POP IBGE_1 IBGE_1-4 IBGE_5-9 IBGE_10-14 IBGE_15-59
## 70              NA     2687     19       95      135        180       1592
## 178             NA   194483   2857    11832    16525      19930     126085
## 295             NA   566369   7978    32261    39994      46322     388256
## 296             NA    11232    201      808      970       1067       7458
## 385             NA    27400    414     1507     2090       2510      18593
## 392             NA    24002    313     1241     1734       2064      15468
## 455             NA     3076     37      187      272        312       1918
## 468             NA    90291    974     3692     4664       5793      63678
## 559             NA   235508   3621    13818    18217      21665     161351
## 588             NA   468215   6431    26573    37438      46131     307093
## 593             NA  2263631  25105    99493   135491     160227    1553724
## 707             NA     5138     73      279      337        402       3549
## 826             NA    55820    934     3428     4449       4775      37322
## 855             NA   113673   1388     5718     7919       9584      76735
## 923             NA   130739   1711     7083     9997      11106      88037
## 978             NA    72014   1044     4131     5264       6142      48767
## 1036            NA   317311   4124    17051    22966      26445     210363
## 1098            NA   346271   5075    20476    26844      30076     235128
## 1357            NA     5396     61      258      384        481       3668
## 1437            NA   174780   2631    10229    13711      15747     118610
## 1490            NA    69434    801     3460     4658       5294      48044
## 1509            NA  1688975  21318    81910   107086     124078    1162399
## 1552            NA   349014   4962    19917    26172      29822     240126
## 1657            NA   226818   3480    13879    18333      21501     153634
## 1731            NA    41441    716     2709     3619       4451      26865
## 1774            NA     2147     41      137      154        139       1599
## 1832            NA  2366137  32011   129766   169050     199714    1604565
## 2037            NA  1036178  14365    58730    78414      90184     701841
## 2068            NA   187164   2665    10742    14380      17134     127766
## 2155            NA    22836    240     1067     1543       1967      14311
## 2166            NA     5177     55      259      391        462       3151
## 2181            NA    38550    410     1853     2583       3149      25452
## 2360            NA    19132    265     1139     1616       1925      12289
## 2376            NA   196067   3118    13035    17401      20070     129795
## 2401            NA   314720   5082    20794    28493      32792     208097
## 2515            NA   107719   1693     6560     8681      10081      73926
## 2524            NA    88906   1261     5580     7664       9076      57539
## 2587            NA     7995     85      430      518        553       5285
## 2752            NA   151542   2199     8813    11662      13234     104619
## 2783            NA     5437     71      310      342        428       3572
## 2937            NA     2838     45      211      277        266       1701
## 3028            NA   376982   5117    20512    26902      32168     259445
## 3052            NA   167162   2100     8849    12068      14734     109979
## 3241            NA   790062  10364    41558    54619      64785     536124
## 3267            NA   155579   1755     7410    10445      12252     102966
## 3274            NA   409668   3643    14642    19875      23912     271260
## 3470            NA   616068   8089    32305    42733      49379     420590
## 3500            NA     4481     87      400      506        567       2714
## 3624            NA   201036   3041    11801    15437      17684     138581
## 3669            NA   250978   3236    12964    17685      20225     170356
## 3804            NA   115412   1753     6613     9072      10239      78169
## 3828            NA    20157    228      963     1310       1728      13233
## 3850            NA     2785     37      124      178        256       1802
## 3949            NA  1339712  15235    58369    79310      93989     889503
## 4002            NA   249407   3437    14139    18886      21424     159645
## 4074            NA   133313   1866     7824    10858      13204      87112
## 4112            NA  1157593  13606    54720    73132      84879     782716
## 4179            NA   108060   1257     5224     7114       8531      73953
## 4209            NA  5426838  58958   235380   321084     382267    3559037
## 4225            NA    43776    598     2700     3586       4132      29340
## 4378            NA     7861     96      398      612        678       5307
## 4505            NA    76030   1020     4253     5916       6877      51739
## 4537            NA   645047   7233    29315    38363      44476     436194
## 4608            NA   148474   1336     5477     7290       8596      97726
## 4707            NA   446505   5673    23189    32842      39515     294332
## 4807            NA    40784    486     1939     2660       3228      26576
## 5117            NA   241855   3681    14077    18533      21480     164970
## 5367            NA      794     19       83      129        110        424
## 5431            NA   129290   2232     9321    11738      12659      87096
## 5443            NA    42806    646     2566     3393       3919      28596
## 5459            NA   103400   1546     5931     7728       8767      71395
## 5486            NA    84080   1252     4932     6824       7988      56595
## 5537            NA   299922   3494    13665    17384      20693     207460
##      IBGE_60+ IDHM Ranking 2010  IDHM IDHM_Renda IDHM_Longevidade IDHM_Educacao
## 70        666                 2 0.850      0.849            0.890         0.825
## 178     17254              1952 0.700      0.694            0.874         0.564
## 295     51558               230 0.770      0.784            0.823         0.708
## 296       728              1784 0.700      0.717            0.814         0.597
## 385      2286              1086 0.728      0.750            0.824         0.624
## 392      3182               940 0.733      0.722            0.805         0.677
## 455       350              4614 0.581      0.541            0.731         0.495
## 468     11490                 4 0.845      0.854            0.894         0.789
## 559     16836                91 0.790      0.791            0.866         0.708
## 588     44549              2358 0.684      0.662            0.808         0.598
## 593    289591                20 0.810      0.841            0.856         0.737
## 707       498               122 0.780      0.753            0.864         0.732
## 826      4912               595 0.748      0.782            0.822         0.651
## 855     12329               435 0.757      0.749            0.857         0.675
## 923     12805              2152 0.690      0.656            0.805         0.628
## 978      6666               243 0.769      0.733            0.840         0.739
## 1036    36362               552 0.750      0.768            0.864         0.636
## 1098    28672               578 0.750      0.721            0.842         0.693
## 1357      544               613 0.747      0.706            0.830         0.711
## 1437    13852               138 0.780      0.789            0.851         0.707
## 1490     7177               856 0.737      0.716            0.821         0.681
## 1509   192184                10 0.823      0.850            0.855         0.768
## 1552    28015               425 0.757      0.717            0.844         0.716
## 1657    15991               915 0.735      0.700            0.839         0.676
## 1731     3081              1877 0.701      0.700            0.794         0.621
## 1774       77                83 0.790      0.781            0.839         0.748
## 1832   231031               467 0.754      0.749            0.824         0.695
## 2037    92644               334 0.763      0.746            0.831         0.717
## 2068    14477               447 0.760      0.716            0.859         0.703
## 2155     3708               363 0.760      0.744            0.841         0.704
## 2166      859              1171 0.725      0.696            0.823         0.666
## 2181     5103               293 0.765      0.734            0.868         0.703
## 2360     1898              2670 0.670      0.657            0.826         0.553
## 2376    12648               900 0.735      0.687            0.855         0.677
## 2401    19462              1510 0.710      0.665            0.844         0.648
## 2515     6778               375 0.760      0.738            0.841         0.706
## 2524     7786              2935 0.659      0.637            0.809         0.555
## 2587     1124              1967 0.700      0.707            0.824         0.585
## 2752    11015               482 0.754      0.781            0.827         0.663
## 2783      714               730 0.742      0.722            0.864         0.654
## 2937      338              5404 0.529      0.525            0.691         0.408
## 3028    32838               276 0.770      0.721            0.852         0.733
## 3052    19432               850 0.737      0.704            0.839         0.678
## 3241    82612               325 0.763      0.768            0.835         0.694
## 3267    20751               498 0.753      0.731            0.817         0.716
## 3274    76336                 7 0.840      0.887            0.854         0.773
## 3470    62972               174 0.780      0.776            0.840         0.718
## 3500      207              3132 0.650      0.624            0.788         0.558
## 3624    14492               278 0.770      0.750            0.825         0.726
## 3669    26512               977 0.732      0.673            0.830         0.703
## 3804     9566               530 0.750      0.761            0.836         0.666
## 3828     2695               816 0.740      0.758            0.851         0.625
## 3850      388              1114 0.727      0.679            0.810         0.698
## 3949   203306                32 0.805      0.867            0.857         0.702
## 4002    31876               478 0.754      0.744            0.834         0.692
## 4074    12449              2460 0.680      0.659            0.810         0.589
## 4112   148540               215 0.772      0.798            0.825         0.698
## 4179    11981               101 0.784      0.749            0.847         0.760
## 4209   870112                46 0.799      0.840            0.845         0.719
## 4225     3420               576 0.750      0.684            0.823         0.745
## 4378      770              1755 0.706      0.660            0.839         0.636
## 4505     6225                16 0.810      0.876            0.849         0.725
## 4537    89466                14 0.815      0.819            0.861         0.769
## 4608    28049                 1 0.862      0.891            0.887         0.811
## 4707    50954              1341 0.720      0.693            0.831         0.646
## 4807     5895               391 0.759      0.746            0.871         0.673
## 5117    19114               240 0.769      0.742            0.863         0.710
## 5367       29              5561 0.453      0.439            0.766         0.276
## 5431     6244               632 0.746      0.733            0.815         0.695
## 5443     3686               237 0.770      0.755            0.884         0.683
## 5459     8033               383 0.759      0.720            0.863         0.705
## 5486     6489              2242 0.688      0.677            0.811         0.592
## 5537    37226                 5 0.845      0.876            0.855         0.805
##           LONG        LAT     ALT    AREA             RURAL_URBAN GVA_AGROPEC
## 70   -47.88397 -22.597340  515.24    3.61 Intermediário Adjacente        0.00
## 178  -51.07773 -29.997493   13.44   71.60                  Urbano     1379.14
## 295  -37.04821 -10.907216    4.29  182.16                  Urbano        2.68
## 296  -47.07155 -23.430041  710.68  145.20 Intermediário Adjacente     1213.11
## 385  -41.88775 -22.757764   10.97   70.98                  Urbano     8563.51
## 392  -42.02834 -22.967638    8.84  152.11                  Urbano    15182.40
## 455  -34.94993  -6.679529    8.14  102.64         Rural Adjacente    14355.43
## 468  -48.63462 -26.991819    9.05   45.21                  Urbano        7.56
## 559  -46.87465 -23.508902  741.57   65.70                  Urbano      315.72
## 588  -43.39962 -22.764556   17.90   78.99                  Urbano     2679.25
## 593  -43.92645 -19.937524  937.53  331.40                  Urbano     2300.08
## 707  -48.52135 -27.144255   73.30   35.14                  Urbano    17434.02
## 826  -34.83943  -6.966983    4.58   29.76                  Urbano        7.13
## 855  -51.09368 -29.950629   14.16   43.90                  Urbano     1109.77
## 923  -34.99572  -8.020522   43.12   51.26                  Urbano     7987.96
## 978  -46.76382 -23.209396  765.88   79.40                  Urbano        7.87
## 1036 -51.18103 -29.918697   19.40  130.79                  Urbano     6887.23
## 1098 -46.84145 -23.535249  785.34   34.55                  Urbano      225.52
## 1357 -43.99560 -19.629948  767.79   42.36                  Urbano      418.24
## 1437 -46.93185 -23.603514  850.25  323.99                  Urbano       22.25
## 1490 -46.42003 -23.883839    6.88  142.88                  Urbano      902.06
## 1509 -49.27185 -25.432956  910.89  435.04                  Urbano    11206.58
## 1552 -46.62338 -23.689295  812.84   30.73                  Urbano        0.77
## 1657 -46.85086 -23.647313  791.83   70.40                  Urbano      270.62
## 1731 -38.44512  -3.886973   33.06   79.01                  Urbano    19350.98
## 1774 -32.43519  -3.852021    0.00   18.61            Rural Remoto      484.83
## 1832 -38.58993  -3.723805   29.91  312.41                  Urbano    47368.39
## 2037 -46.53108 -23.468506  776.36  318.68                  Urbano    40225.68
## 2068 -47.22110 -22.858395  584.89   62.42                  Urbano     1117.40
## 2155 -42.22212 -22.839057    7.03   50.54                  Urbano     2872.62
## 2166 -47.55432 -24.739240    7.93  196.57 Intermediário Adjacente     3788.65
## 2181 -48.66928 -28.239951   22.67  182.91                  Urbano    27516.95
## 2360 -38.68403 -12.881489    3.67  118.04                  Urbano     7265.47
## 2376 -46.93337 -23.546934  743.05   82.66                  Urbano      248.07
## 2401 -46.35160 -23.476897  762.25   82.62                  Urbano    10994.59
## 2515 -46.90522 -23.529939  755.57   17.45                  Urbano      496.13
## 2524 -43.65379 -22.644819   32.94   81.70                  Urbano        7.73
## 2587 -46.27342 -22.930678  924.36  374.29         Rural Adjacente    27249.17
## 2752 -38.32346 -12.896718   18.15   57.66                  Urbano     2219.82
## 2783 -46.66148 -22.520488  717.27   48.76 Intermediário Adjacente        5.86
## 2937 -35.01392  -6.770054   92.93  123.83         Rural Adjacente    23738.38
## 3028 -46.45826 -23.669335  789.33   61.91                  Urbano      609.66
## 3052 -43.42922 -22.768088   20.71   41.49                  Urbano     2592.96
## 3241 -35.25225  -5.750899   44.47  167.40                  Urbano       17.08
## 3267 -43.41661 -22.807514   19.13   19.39                  Urbano        0.00
## 3274 -43.07582 -22.896452  117.78  133.76                  Urbano    17659.85
## 3470 -46.78881 -23.533612  742.97   64.95                  Urbano      947.16
## 3500 -61.14731   4.475259  912.13 8028.48            Rural Remoto     7151.02
## 3624 -35.25921  -5.910370   55.50  124.01                  Urbano    24157.84
## 3669 -34.88479  -7.943188   19.58   96.85                  Urbano    11249.65
## 3804 -49.19920 -25.442198  880.64   60.87                  Urbano     1323.71
## 3828 -46.35876 -23.050499  793.71  385.57 Intermediário Adjacente    24998.16
## 3850 -47.00097 -23.397523  705.51  108.49                  Urbano        0.16
## 3949 -51.22866 -30.030037   42.24  495.39                  Urbano    28354.58
## 4002 -46.41205 -24.003021    8.68  149.25                  Urbano     3653.34
## 4074 -43.55567 -22.717430   33.15   75.70                  Urbano     2996.82
## 4112 -34.88894  -8.062762   10.33  218.84                  Urbano    34667.91
## 4179 -46.41534 -23.707423  757.08   99.08                  Urbano     1821.37
## 4209 -43.22788 -22.876652   11.80 1200.26                  Urbano       81.37
## 4225 -46.39369 -23.744515  762.98   36.34                  Urbano        0.68
## 4378 -44.22326 -21.119471  908.71    3.57                  Urbano      218.05
## 4505 -46.92209 -23.449453  769.83  179.95                  Urbano    94613.78
## 4537 -46.53087 -23.657510  764.10  175.78                  Urbano     1012.60
## 4608 -46.57151 -23.614705  754.99   15.33                  Urbano       23.52
## 4707 -43.37188 -22.802331   15.56   35.22                  Urbano     1034.78
## 4807 -45.05337 -22.117769  888.72   58.02                  Urbano     1715.32
## 5117 -46.78578 -23.623328  803.24   20.39                  Urbano      180.41
## 5367 -60.19572   4.585440  605.80 8065.56            Rural Remoto     9864.83
## 5431 -47.98411 -16.069575 1105.85   60.95                  Urbano        0.43
## 5443 -47.01965 -23.615302  926.93   42.49                  Urbano    32189.74
## 5459 -46.82989 -23.214467  729.74   35.12                  Urbano     1872.03
## 5486 -43.91992 -19.693030  679.03   71.08                  Urbano      791.68
## 5537 -40.32221 -20.320154    0.00   97.12                  Urbano    14437.58
##      GVA_INDUSTRY GVA_SERVICES  GVA_PUBLIC   GVA_TOTAL        TAXES
## 70       10504.94     94367.21    20211.04    125083.20     7533.36
## 178        450.98      1108.76   749842.52   2310961.10   212178.51
## 295    2624804.03   9244912.10     2652.55  14524947.50  1973534.61
## 296     626413.92   1010191.11   104138.74   1741956.88      379.54
## 385        285.57    819191.05   284130.31   1397458.05    77485.55
## 392      87133.35    276890.43   242622.01    621828.18    33333.00
## 455       2162.40     18541.17    39788.97     74847.96        2.56
## 468     671328.06   3006137.08   757157.55   4442184.84   488228.41
## 559       4212.89  28113200.81  1747221.73  34073626.94 13014674.64
## 588     988889.97   3428106.83  3073707.32   7493383.37   790665.42
## 593   11901585.70  53213121.50 10664796.91     75781.80    12495.66
## 707      77136.54    344876.47   105221.09       544.67    48297.74
## 826     426650.48   1283696.98   373485.12      2090.96   397320.13
## 855       1079.27   2399310.59   578588.41   4058276.69     1206.66
## 923      75798.16    739599.70   509551.68   1332937.50   142773.53
## 978     378706.96    811522.38   287912.58   1486007.07   189987.65
## 1036   8291607.70   7300984.36  1599697.49  17199176.79  2329361.47
## 1098    814649.19   2891957.81  1059656.47   4766488.99      447.62
## 1357     65122.69       819.12       39.40    924053.76       82.96
## 1437   2677627.60   5308453.91      894.96   8903289.39  2088168.40
## 1490  10055568.72   4752510.12   795266.65  15604247.55  2063753.92
## 1509  12802824.51  46249176.62  8813208.36     67876.42    15912.49
## 1552   3547136.52   5907985.66  1541552.72     10997.44     2232.30
## 1657   1560323.83      5720.78   843771.86   8125147.04  1879499.86
## 1731   1038868.71   1110685.38   254682.16   2423587.22   644359.77
## 1774      5392.96     95943.85        7.26    109079.16    15244.51
## 1832   9060367.79  35008332.72  8020931.83  52137000.73  8004144.47
## 2037  11091048.69  29551381.17  4840743.94  45523399.48  8451519.21
## 2068   3945253.89      5001.13   841536.81   9789041.14  1726030.36
## 2155        35.90    169977.75   198551.93       407.30       19.63
## 2166    268191.60    189875.31    84032.76    545888.32     9339.32
## 2181    160695.15       773.40   188791.13   1150399.50   213135.17
## 2360     21341.65     93112.91    74443.99    196164.02    13710.64
## 2376   3286882.77      6090.88      806.50  10184508.99  1963153.38
## 2401   1389143.49   3194865.51  1162896.21   5757899.80      749.79
## 2515    717358.59   1612719.13      407.71   2738284.17   681348.11
## 2524    133350.13    420982.38   636777.17   1198835.42    95810.65
## 2587     16273.71    100406.88    47870.14    191799.90    10375.20
## 2752   1225025.66   3357161.11   665466.27      5249.87   854208.18
## 2783     35331.54     66188.94    32392.80    139770.45    11974.48
## 2937      1724.29     11192.86    37551.82     74207.34     1436.36
## 3028      4373.72   6294256.62  1348676.31  12017265.64  1946580.85
## 3052    139944.30    953082.06  1086298.23   2181917.56   109216.36
## 3241   2932165.56  12122912.17  3849761.91  18921922.95  2923557.74
## 3267    149020.37      1290.53   957704.94   2397252.66   143063.11
## 3274   4391714.38  12624595.99  3677333.63  20711303.86  2292039.48
## 3470      3036.39  53111430.66  2627577.24  58776349.73 15626341.33
## 3500      8161.28     29966.90   114622.75    159901.95     4616.44
## 3624       674.82      2422.88  1108079.13   4229938.39   792553.72
## 3669    583029.25   1786743.49     1045.74   3426760.35   389256.87
## 3804   1082404.71      2837.23   534458.77   4455414.76   931155.45
## 3828     71238.25    216934.61    98015.49    411186.50    32706.56
## 3850     48176.37    131861.11    76436.12    256631.46    11722.54
## 3949   6768083.47  48930408.04  6712383.63  62439229.72 10986034.54
## 4002       735.55   3739905.66  1301224.94   5780331.57   400744.30
## 4074   1077940.68   2045289.03   944494.99   4070721.52   599495.51
## 4112   5929707.87  29628615.15  6143514.39  41736505.32     7807.58
## 4179       700.98   1571975.50   431476.15   2706251.06   315587.79
## 4209  36334430.50 177361095.84    47548.35 261325243.88    68106.12
## 4225    162057.04       223.38   149879.29    536005.86    42080.68
## 4378         5.18     36085.18    29902.03     71382.30     3498.58
## 4505   1743146.63      4593.73   658868.33   7090362.10  1394975.41
## 4537      4327.28  15730746.19  2464771.07  22523809.03  3313237.11
## 4608      2950.35      6765.71   980547.41  10696632.39  2590079.00
## 4707    485315.97   5341929.53  2688502.16   8516782.44      894.03
## 4807     63393.89    576631.22   193424.86    835165.29    81645.86
## 5117   2173940.31   4103684.30   885803.95   7163608.97  1186413.66
## 5367      1189.55         4.75       87.28    103089.25        0.59
## 5431    287258.37   1245114.13   551217.67   2084016.30   215292.65
## 5443    383983.22    876724.73   194315.57   1487213.26   258218.45
## 5459    680125.23    965083.82   391572.39   2038653.47   322027.69
## 5486    758627.29   1142201.22   471306.97      2372.93   446829.89
## 5537   3225072.88  11635463.05  1744085.64  16619059.14  5108035.55
##               GDP POP_GDP GDP_CAPITA
## 70      132616.56    3205   41378.02
## 178    2523139.61  207392   12166.04
## 295   16498482.10  641523   25717.68
## 296    2121496.97   20581  103080.36
## 385    1474943.60   31674   46566.38
## 392     655161.18   29077   22531.94
## 455      77405.46    8951    8647.69
## 468    4930413.26  131727   37429.03
## 559   47088301.58  264935  177735.30
## 588    8284048.78  494141   16764.54
## 593   88277462.53 2513451   35122.01
## 707        592.97   18052   32847.65
## 826    2488279.38   66858   37217.38
## 855    5264940.27  126666   41565.54
## 923    1475711.03  155228    9506.73
## 978    1675994.72   81693   20515.77
## 1036  19528538.26  342634   56995.33
## 1098   5214112.51  394465   13218.19
## 1357      1007.01    6545  153860.05
## 1437  10991457.80  233696   47033.14
## 1490     17668.00  127887  138153.22
## 1509     83788.90 1893997   44239.20
## 1552     13229.74  415180   31865.08
## 1657  10004646.90  264448   37832.19
## 1731   3067946.99   51913   59097.86
## 1774       124.32    2974   41803.52
## 1832  60141145.20 2609716   23045.09
## 2037  53974918.69 1337087   40367.54
## 2068  11515071.50  219039   52570.87
## 2155       426.93   26430   16153.08
## 2166    555227.64   10476   52999.97
## 2181   1363534.67   43624   31256.53
## 2360    209874.65   22744    9227.69
## 2376  12147662.36  226488   53634.91
## 2401   6507690.31  356774   18240.37
## 2515   3419632.28  120177   28454.96
## 2524   1294646.07  100562   12874.11
## 2587    202175.10   12837   15749.40
## 2752   6104081.03  194641   31360.72
## 2783    151744.92    7591   19990.11
## 2937     75643.70    8475    8925.51
## 3028  13963846.49  457696   30509.00
## 3052   2291133.91  171020   13396.88
## 3241  21845480.68  877662   24890.54
## 3267   2540315.77  158319   16045.55
## 3274  23003343.34  497883   46202.31
## 3470  74402691.05  696382  106841.78
## 3500    164518.39   12144   13547.30
## 3624   5022492.12  248623   20201.24
## 3669      3816.02  325590   11720.31
## 3804   5386570.20  128256   41998.58
## 3828    443893.06   26841   16537.87
## 3850    268353.99   17913   14980.96
## 3949     73425.26 1481019   49577.53
## 4002      6181.08  304705   20285.44
## 4074   4670217.02  144525   32314.25
## 4112  49544087.54 1625583   30477.73
## 4179   3021838.84  121130   24947.07
## 4209 329431359.90 6498837   50690.82
## 4225    578086.55   48861   11831.25
## 4378        74.88    8489    8820.93
## 4505      8485.34  129261   65644.99
## 4537  25837046.14  712749   36249.85
## 4608     13286.71  158825   83656.30
## 4707      9410.81  460541   20434.26
## 4807    916811.15   45128   20315.79
## 5117   8350022.63  275948   30259.41
## 5367    103680.32    9664   10728.51
## 5431   2299308.95  156419   14699.68
## 5443   1745431.72   49542   35231.35
## 5459   2360681.16  117772   20044.50
## 5486   2819757.05  120510   23398.53
## 5537  21727094.68  359555   60427.74
##                                                                  GVA_MAIN
## 70                                                        Demais serviços
## 178                                                       Demais serviços
## 295                                                       Demais serviços
## 296                                                       Demais serviços
## 385                                                       Demais serviços
## 392  Administração, defesa, educação e saúde públicas e seguridade social
## 455  Administração, defesa, educação e saúde públicas e seguridade social
## 468                                                       Demais serviços
## 559                                                       Demais serviços
## 588  Administração, defesa, educação e saúde públicas e seguridade social
## 593                                                       Demais serviços
## 707                                                       Demais serviços
## 826           Comércio e reparação de veículos automotores e motocicletas
## 855                                                       Demais serviços
## 923                                                       Demais serviços
## 978                                                       Demais serviços
## 1036                                          Indústrias de transformação
## 1098                                                      Demais serviços
## 1357                                                      Demais serviços
## 1437                                                      Demais serviços
## 1490                                          Indústrias de transformação
## 1509                                                      Demais serviços
## 1552                                                      Demais serviços
## 1657          Comércio e reparação de veículos automotores e motocicletas
## 1731                                                      Demais serviços
## 1774                                                      Demais serviços
## 1832                                                      Demais serviços
## 2037                                                      Demais serviços
## 2068                                                      Demais serviços
## 2155 Administração, defesa, educação e saúde públicas e seguridade social
## 2166                                                Indústrias extrativas
## 2181                                                      Demais serviços
## 2360 Administração, defesa, educação e saúde públicas e seguridade social
## 2376                                                      Demais serviços
## 2401                                                      Demais serviços
## 2515                                                      Demais serviços
## 2524 Administração, defesa, educação e saúde públicas e seguridade social
## 2587                                                      Demais serviços
## 2752                                                      Demais serviços
## 2783                                                      Demais serviços
## 2937 Administração, defesa, educação e saúde públicas e seguridade social
## 3028                                                      Demais serviços
## 3052 Administração, defesa, educação e saúde públicas e seguridade social
## 3241                                                      Demais serviços
## 3267                                                      Demais serviços
## 3274                                                      Demais serviços
## 3470                                                      Demais serviços
## 3500 Administração, defesa, educação e saúde públicas e seguridade social
## 3624                                                      Demais serviços
## 3669                                                      Demais serviços
## 3804                                                      Demais serviços
## 3828                                                      Demais serviços
## 3850                                                      Demais serviços
## 3949                                                      Demais serviços
## 4002                                                      Demais serviços
## 4074                                                      Demais serviços
## 4112                                                      Demais serviços
## 4179                                                      Demais serviços
## 4209                                                      Demais serviços
## 4225                                                      Demais serviços
## 4378                                                      Demais serviços
## 4505                                                      Demais serviços
## 4537                                                      Demais serviços
## 4608                                                      Demais serviços
## 4707                                                      Demais serviços
## 4807                                                      Demais serviços
## 5117                                                      Demais serviços
## 5367 Administração, defesa, educação e saúde públicas e seguridade social
## 5431                                                      Demais serviços
## 5443                                                      Demais serviços
## 5459                                                      Demais serviços
## 5486                                                      Demais serviços
## 5537                                                      Demais serviços
##      COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G COMP_H COMP_I
## 70        202      2      1      4      0      1      6     99      3     40
## 178      2994      3      2    382      0      9    309   1324    180    152
## 295     14534     23     14    721      5     21    717   5262    358   1078
## 296       640      2      7     74      0      2     36    241     53     42
## 385      1674      1      1     30      0      5     55    520     27    512
## 392       648      1      1     14      0      8     53    198     27    103
## 455        85      0      0      0      0      1      0     39      1     10
## 468      9507      2      1    359      0     14    821   3064    151    958
## 559     12513     14      7    711      9     34    713   3432    661    816
## 588      3751      1      2    286      0     14    251   1581    165    172
## 593    103867    226    139   5221     96    156   6235  28240   2733   6514
## 707      1447      0      0     64      0      5    150    386     18    349
## 826      1370      1      1    129      0      7    115    482     40     73
## 855      4453      4      0    652      0     10    319   1772    323    218
## 923      1328      2      0    127      0      4     42    659     28     85
## 978      1198     16      0    151      0      2     76    459     52     81
## 1036    11222      3      7    998      2     35    739   4377    922    563
## 1098     5413      3      0    389      0     19    390   2365    351    367
## 1357      201      1      0     29      0      0     12     67     15     19
## 1437     7630     23      2    640      0     21    416   2441    363    387
## 1490     1856      0      0     61      1      7    155    648    218    194
## 1509   101929    227     34   6025    109    163   6373  33566   3873   5855
## 1552     8064      3      0   1550      0     30    352   3080    492    504
## 1657     3708     13      3    245      1     24    256   1597    213    231
## 1731     2387     11      2    288      0     14    203    732     74     55
## 1774      236      2      0      4      0      0      1     39      9    124
## 1832    57476    132     30   5739     39    111   3008  20909   1627   3744
## 2037    28349     35      8   2847      1    111   1304  11186   2381   1879
## 2068     4324     10      0    380      0     13    379   1747    261    264
## 2155      331      1      0     13      0      1     20    137      5     16
## 2166      292      1      0      7      0      1     15    144      3     64
## 2181     1548      4      9    106      0      6     92    620    125    219
## 2360      163      0      0      4      0      1      5     84      3     24
## 2376     2399      3      0    168      0      6    192    973    155    147
## 2401     4336     34      3    647      0     27    275   1902    224    237
## 2515     1967      0      0    204      0      5    162    752    172    104
## 2524      614      1      4     44      0      3     47    241     34     33
## 2587      522     39      0     58      0      0     13    229     21     35
## 2752     7118     14      6    575      0     14    572   2500    207    359
## 2783      262      3      1     38      0      0     23     84      5     32
## 2937       36      2      0      2      0      0      0     15      1      1
## 3028     6448      3      1    875      1     37    406   2515    285    403
## 3052     1468      1      0    126      0      5     86    641     51     86
## 3241    21530     57     16   1156     21     37   1770   7592    382   1682
## 3267     2022      2      0    127      0      2     98    857     62    157
## 3274    17097     13      9    603      5     24    869   4623    293   1268
## 3470    15315     11      2    863      1     29    703   6002   1026   1230
## 3500      108      0      0      1      0      0      3     86      3      5
## 3624     4074     10      2    342      0     14    326   1776    131    259
## 3669     3007     10      0    309      1     11    140   1334     65    176
## 3804     5275      5      1   1018      0     21    445   1943    219    275
## 3828      773     32      3    101      0      3     39    305      8     59
## 3850      429      1      5     20      0      0     12    101     10     18
## 3949    80082    196     31   3482     57     95   4039  21550   2523   4205
## 4002     7418      3      0    191      0     14    538   2334    127    608
## 4074     1177      1      2     78      0      6     93    516     44     73
## 4112    40041     84     17   2059     53     63   1967  13147   1176   2882
## 4179     2751     10      1    271      0      5    138   1010    150    174
## 4209   190038    172    274   6824    235    272   7797  47545   4825  12289
## 4225      424      5      0     25      0      3     37    186     29     28
## 4378      205      0      1     76      0      0      3     77      7      9
## 4505     8909      9      6    463      1     14    351   1749    261    178
## 4537    24972     11      1   1723      1     36   1340   8480    999   1571
## 4608     9735      3      0    714      1      7    336   2921    328    679
## 4707     5010      2      0    496      0     20    169   2213    237    320
## 4807     1684      2      6     95      0      2     53    764     33    175
## 5117     5135      3      1    513      0     18    332   2060    199    306
## 5367        8      0      0      0      0      0      0      7      0      0
## 5431     2218      1      0    149      0      8    193    976     39    139
## 5443     1360      2      0    149      0      5    104    502     54     68
## 5459     1975      8      0    388      0     13    134    782    137    105
## 5486     1625      3      3    162      0      8    138    594     57     99
## 5537    17924     30     16    535      4     17    850   4405    323   1185
##      COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R COMP_S
## 70        5      2      1      9     15      2      3      3      2      4
## 178      46     14     34     82    172      2     97     37     32    117
## 295     349    182    287   1081   1634     38    565    971    265    963
## 296       9      4     17     24     61      2     22     10      9     25
## 385      17      4     35     35    298      3     26     15     26     64
## 392       8      0      9     17    125      4     22      8      7     43
## 455       0      0      0      1      0      3      9      1      0     20
## 468     140    149    482    466   1976      7    174    250    119    374
## 559     859    820    516   1222   1469      4    322    303    129    472
## 588      35     15     18     76    171      3    239     88     54    580
## 593    3951   3501   2785  11925  17752     85   3325   4217   1436   5327
## 707      14     11     57     27    291      3     11      9     15     37
## 826       6      8     13     20    347      4     32     15     20     57
## 855      99     66     63    201    366      2     86     74     36    162
## 923      24      3     11     27     95      6     78     32     19     86
## 978      38      9     18     56     73      2     84     31     10     40
## 1036    271    159    191    656   1053      4    346    268    150    478
## 1098    164     42     31    177    432      2    240     97     53    291
## 1357      2      1      1     14     21      2      2      3      6      6
## 1437    403    174    197    599    915      4    335    194    116    398
## 1490     22     15     11     87    172      4     78     45     34    104
## 1509   4535   3197   2527   9130  12987     89   3030   4197   1307   4697
## 1552    160     93     91    258    620      4    288    114     76    349
## 1657     92     41     47    142    334      4    184     52     54    175
## 1731     60     86     55    217    218      3     57    226     21     65
## 1774      1      0      0      4     34      1      4      0      8      5
## 1832   1247    902   1113   3321   6545    120   2379   2156    849   3504
## 2037    704    398    464   1155   2413      5    992    696    248   1522
## 2068    145     62     49    210    291      3    190    104     46    170
## 2155      7      1      4      7     53      3     23      7      7     26
## 2166      6      0      2      5     18      2      8      6      0     10
## 2181     22      3     21     55    104      2     36     41     24     59
## 2360      0      0      0      2     15      2      8      4      2      9
## 2376     67     21     24     98    221      2     96     59     23    144
## 2401     50     34     39     88    232      3    158     53     40    290
## 2515     61     15     21     91    159      7     53     26     11    124
## 2524      0      2      0      9     26      4     29     16      9    112
## 2587     10      9     12     20     18      3     17     11      5     22
## 2752    162    100    184    647    946      2    193    221    106    310
## 2783      1      4      2      5     35      2      7      4      5     11
## 2937      0      0      0      1      1      2      5      0      0      6
## 3028    148     71     78    218    492      3    289    159     83    381
## 3052     31      7      7     41    107      3     75     30     24    147
## 3241    430    300    551   1454   2453     44    844   1165    333   1243
## 3267     29     10     18     71    153      4    110     99     29    194
## 3274    526    304    408   1421   3144     17    695   1111    322   1442
## 3470    706    271    238    761   1535      6    527    492    142    770
## 3500      0      0      0      1      3      2      0      1      0      3
## 3624     38     23     97    136    399      3    181     96     63    178
## 3669     64     24     28    132    258      4    193     80     38    140
## 3804    138     73     73    279    347      5    120     91     67    155
## 3828     12     17     11     34     53      3     25     24      9     35
## 3850    179      1      1     21     20      3     19      5      5      8
## 3949   3555   2461   1924   8139  16271     72   2217   3489   1339   4429
## 4002     84     49    149    167   2487      2    198    109     88    270
## 4074     16      2      7     31     44      3     49     39     14    159
## 4112   1123    818    767   3101   6353     91   1572   1698    489   2574
## 4179    188     33     24    137    210      2    111    125     36    126
## 4209   9070   6327   4281  19248  34812    120   6744   9905   5039  14224
## 4225      8      4      4     12     18      2     28     11      3     21
## 4378      3      0      1      7      5      1      2      1      3      9
## 4505   1755    443    240   1610   1143      3    164    125    116    278
## 4537   1239    574    450   1798   3303     27    880   1078    282   1179
## 4608    595    258    197    704   1719      5    270    418    110    470
## 4707     46     21     31    140    235      3    263    159     75    580
## 4807     27     24     20     59    157      4     54    110     27     72
## 5117    165     68     81    180    567      3    218    108     44    269
## 5367      0      0      0      0      0      1      0      0      0      0
## 5431     25     23     56     61    208      3    141     67     31     98
## 5443     61     22     21     76    114      2     62     24     16     78
## 5459     32      9     16     54    128      2     66     27     12     62
## 5486     18      6     21     66    128      4     83    143     19     73
## 5537    609    593    539   2207   3129     55    568   1300    230   1329
##      COMP_U
## 70        0
## 178       0
## 295       0
## 296       0
## 385       0
## 392       0
## 455       0
## 468       0
## 559       0
## 588       0
## 593       3
## 707       0
## 826       0
## 855       0
## 923       0
## 978       0
## 1036      0
## 1098      0
## 1357      0
## 1437      2
## 1490      0
## 1509      8
## 1552      0
## 1657      0
## 1731      0
## 1774      0
## 1832      1
## 2037      0
## 2068      0
## 2155      0
## 2166      0
## 2181      0
## 2360      0
## 2376      0
## 2401      0
## 2515      0
## 2524      0
## 2587      0
## 2752      0
## 2783      0
## 2937      0
## 3028      0
## 3052      0
## 3241      0
## 3267      0
## 3274      0
## 3470      0
## 3500      0
## 3624      0
## 3669      0
## 3804      0
## 3828      0
## 3850      0
## 3949      8
## 4002      0
## 4074      0
## 4112      7
## 4179      0
## 4209     35
## 4225      0
## 4378      0
## 4505      0
## 4537      0
## 4608      0
## 4707      0
## 4807      0
## 5117      0
## 5367      0
## 5431      0
## 5443      0
## 5459      0
## 5486      0
## 5537      0

Here we can see that by comparing the IBGE_DU and IBGE_DU_URBAN values that the NA values are due to missing 0s as all the IBGE_DU are classified as urban. We will then do a mass fill for the columns.

3.1.4.9 Replacing Missing Values with 0 as per observation

Brazil_cities_allpop$IBGE_DU_RURAL[is.na(Brazil_cities_allpop$IBGE_DU_RURAL)] <- 0

summary(Brazil_cities_allpop)
##                   CITY_STATE       CITY              STATE          
##  Abadia De Goiás_GO    :   1   Length:5565        Length:5565       
##  Abadia Dos Dourados_MG:   1   Class :character   Class :character  
##  Abadiânia_GO          :   1   Mode  :character   Mode  :character  
##  Abaeté_MG             :   1                                        
##  Abaetetuba_PA         :   1                                        
##  Abaiara_CE            :   1                                        
##  (Other)               :5559                                        
##     CAPITAL          IBGE_RES_POP      IBGE_RES_POP_BRAS  IBGE_RES_POP_ESTR 
##  Min.   :0.000000   Min.   :     805   Min.   :     805   Min.   :     0.0  
##  1st Qu.:0.000000   1st Qu.:    5235   1st Qu.:    5230   1st Qu.:     0.0  
##  Median :0.000000   Median :   10934   Median :   10926   Median :     0.0  
##  Mean   :0.004852   Mean   :   34278   Mean   :   34200   Mean   :    77.5  
##  3rd Qu.:0.000000   3rd Qu.:   23424   3rd Qu.:   23390   3rd Qu.:    10.0  
##  Max.   :1.000000   Max.   :11253503   Max.   :11133776   Max.   :119727.0  
##                                                                             
##     IBGE_DU        IBGE_DU_URBAN     IBGE_DU_RURAL      IBGE_POP       
##  Min.   :    239   Min.   :     60   Min.   :    0   Min.   :     174  
##  1st Qu.:   1572   1st Qu.:    874   1st Qu.:  471   1st Qu.:    2801  
##  Median :   3174   Median :   1846   Median :  918   Median :    6170  
##  Mean   :  10303   Mean   :   8859   Mean   : 1443   Mean   :   27595  
##  3rd Qu.:   6726   3rd Qu.:   4624   3rd Qu.: 1813   3rd Qu.:   15302  
##  Max.   :3576148   Max.   :3548433   Max.   :33809   Max.   :10463636  
##  NA's   :2         NA's   :2                                           
##      IBGE_1            IBGE_1-4         IBGE_5-9        IBGE_10-14    
##  Min.   :     0.0   Min.   :     5   Min.   :     7   Min.   :    12  
##  1st Qu.:    38.0   1st Qu.:   158   1st Qu.:   220   1st Qu.:   259  
##  Median :    92.0   Median :   376   Median :   516   Median :   588  
##  Mean   :   383.3   Mean   :  1544   Mean   :  2069   Mean   :  2381  
##  3rd Qu.:   232.0   3rd Qu.:   951   3rd Qu.:  1300   3rd Qu.:  1478  
##  Max.   :129464.0   Max.   :514794   Max.   :684443   Max.   :783702  
##                                                                       
##    IBGE_15-59         IBGE_60+       IDHM Ranking 2010      IDHM       
##  Min.   :     94   Min.   :     29   Min.   :   1      Min.   :0.4180  
##  1st Qu.:   1734   1st Qu.:    341   1st Qu.:1392      1st Qu.:0.5990  
##  Median :   3841   Median :    722   Median :2782      Median :0.6650  
##  Mean   :  18212   Mean   :   3004   Mean   :2783      Mean   :0.6592  
##  3rd Qu.:   9628   3rd Qu.:   1724   3rd Qu.:4173      3rd Qu.:0.7180  
##  Max.   :7058221   Max.   :1293012   Max.   :5565      Max.   :0.8620  
##                                      NA's   :1         NA's   :1       
##    IDHM_Renda     IDHM_Longevidade IDHM_Educacao         LONG       
##  Min.   :0.4000   Min.   :0.6720   Min.   :0.2070   Min.   :-72.92  
##  1st Qu.:0.5720   1st Qu.:0.7690   1st Qu.:0.4900   1st Qu.:-50.87  
##  Median :0.6540   Median :0.8080   Median :0.5600   Median :-46.52  
##  Mean   :0.6429   Mean   :0.8016   Mean   :0.5591   Mean   :-46.23  
##  3rd Qu.:0.7070   3rd Qu.:0.8360   3rd Qu.:0.6310   3rd Qu.:-41.41  
##  Max.   :0.8910   Max.   :0.8940   Max.   :0.8250   Max.   :-32.44  
##  NA's   :1        NA's   :1        NA's   :1        NA's   :2       
##       LAT               ALT                AREA           RURAL_URBAN       
##  Min.   :-33.688   Min.   :     0.0   Min.   :     3.57   Length:5565       
##  1st Qu.:-22.838   1st Qu.:   169.7   1st Qu.:   204.53   Class :character  
##  Median :-18.090   Median :   406.5   Median :   416.59   Mode  :character  
##  Mean   :-16.445   Mean   :   894.0   Mean   :  1515.39                     
##  3rd Qu.: -8.489   3rd Qu.:   629.0   3rd Qu.:  1025.73                     
##  Max.   :  4.585   Max.   :874579.0   Max.   :159533.33                     
##  NA's   :2         NA's   :2          NA's   :1                             
##   GVA_AGROPEC       GVA_INDUSTRY       GVA_SERVICES         GVA_PUBLIC      
##  Min.   :      0   Min.   :       1   Min.   :        2   Min.   :       7  
##  1st Qu.:   4193   1st Qu.:    1725   1st Qu.:    10113   1st Qu.:   17260  
##  Median :  20430   Median :    7425   Median :    31212   Median :   35809  
##  Mean   :  47263   Mean   :  176049   Mean   :   489855   Mean   :  123844  
##  3rd Qu.:  51238   3rd Qu.:   41011   3rd Qu.:   115521   3rd Qu.:   89316  
##  Max.   :1402282   Max.   :63306755   Max.   :464656988   Max.   :41902893  
##                                                                             
##    GVA_TOTAL             TAXES                GDP               POP_GDP        
##  Min.   :       17   Min.   :   -14159   Min.   :       15   Min.   :     815  
##  1st Qu.:    42254   1st Qu.:     1303   1st Qu.:    43706   1st Qu.:    5488  
##  Median :   119481   Median :     5107   Median :   125111   Median :   11584  
##  Mean   :   833592   Mean   :   118962   Mean   :   955266   Mean   :   37023  
##  3rd Qu.:   313988   3rd Qu.:    22209   3rd Qu.:   329717   3rd Qu.:   25102  
##  Max.   :569910503   Max.   :117125387   Max.   :687035890   Max.   :12038175  
##                                                                                
##    GDP_CAPITA       GVA_MAIN            COMP_TOT            COMP_A       
##  Min.   :  3191   Length:5565        Min.   :     6.0   Min.   :   0.00  
##  1st Qu.:  9062   Class :character   1st Qu.:    68.0   1st Qu.:   1.00  
##  Median : 15866   Mode  :character   Median :   162.0   Median :   2.00  
##  Mean   : 21119                      Mean   :   907.5   Mean   :  18.27  
##  3rd Qu.: 26155                      3rd Qu.:   449.0   3rd Qu.:   8.00  
##  Max.   :314638                      Max.   :530446.0   Max.   :1948.00  
##                                                                          
##      COMP_B            COMP_C            COMP_D             COMP_E      
##  Min.   :  0.000   Min.   :    0.0   Min.   :  0.0000   Min.   :  0.00  
##  1st Qu.:  0.000   1st Qu.:    3.0   1st Qu.:  0.0000   1st Qu.:  0.00  
##  Median :  0.000   Median :   11.0   Median :  0.0000   Median :  0.00  
##  Mean   :  1.853   Mean   :   73.5   Mean   :  0.4264   Mean   :  2.03  
##  3rd Qu.:  2.000   3rd Qu.:   39.0   3rd Qu.:  0.0000   3rd Qu.:  1.00  
##  Max.   :274.000   Max.   :31566.0   Max.   :332.0000   Max.   :657.00  
##                                                                         
##      COMP_F             COMP_G             COMP_H             COMP_I        
##  Min.   :    0.00   Min.   :     1.0   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    1.00   1st Qu.:    32.0   1st Qu.:    1.00   1st Qu.:    2.00  
##  Median :    4.00   Median :    75.0   Median :    7.00   Median :    7.00  
##  Mean   :   43.29   Mean   :   348.2   Mean   :   41.02   Mean   :   55.92  
##  3rd Qu.:   15.00   3rd Qu.:   200.0   3rd Qu.:   25.00   3rd Qu.:   24.00  
##  Max.   :25222.00   Max.   :150633.0   Max.   :19515.00   Max.   :29290.00  
##                                                                             
##      COMP_J             COMP_K             COMP_L             COMP_M        
##  Min.   :    0.00   Min.   :    0.00   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    0.00   1st Qu.:    0.00   1st Qu.:    0.00   1st Qu.:    1.00  
##  Median :    1.00   Median :    0.00   Median :    0.00   Median :    4.00  
##  Mean   :   24.76   Mean   :   15.56   Mean   :   15.15   Mean   :   51.34  
##  3rd Qu.:    5.00   3rd Qu.:    2.00   3rd Qu.:    3.00   3rd Qu.:   13.00  
##  Max.   :38720.00   Max.   :23738.00   Max.   :14003.00   Max.   :49181.00  
##                                                                             
##      COMP_N             COMP_O            COMP_P             COMP_Q        
##  Min.   :    0.00   Min.   :  1.000   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    1.00   1st Qu.:  2.000   1st Qu.:    2.00   1st Qu.:    1.00  
##  Median :    4.00   Median :  2.000   Median :    6.00   Median :    3.00  
##  Mean   :   83.77   Mean   :  3.271   Mean   :   30.98   Mean   :   34.18  
##  3rd Qu.:   14.00   3rd Qu.:  3.000   3rd Qu.:   17.00   3rd Qu.:   12.00  
##  Max.   :76757.00   Max.   :204.000   Max.   :16030.00   Max.   :22248.00  
##                                                                            
##      COMP_R            COMP_S             COMP_U         
##  Min.   :   0.00   Min.   :    0.00   Min.   :  0.00000  
##  1st Qu.:   0.00   1st Qu.:    5.00   1st Qu.:  0.00000  
##  Median :   2.00   Median :   12.00   Median :  0.00000  
##  Mean   :  12.19   Mean   :   51.65   Mean   :  0.05031  
##  3rd Qu.:   6.00   3rd Qu.:   31.00   3rd Qu.:  0.00000  
##  Max.   :6687.00   Max.   :24832.00   Max.   :123.00000  
## 

3.1.4.10 Dealing with missing IBGE_DU

In this case, IBGE_DU in the reference refers to “Domestic Units”. Upon further investigation, this is reference to Permenant Private Housing Units. We determined this by viewing the source of the data and observing the additional description at the top of the Webpage. Source: https://sidra.ibge.gov.br/tabela/3495

Unfortunately the source data does not provide us with the values we need. However, we can use the alternate data source from the IBGE website report to find a good estimate of these values. Although the values are not exact due to some corrections made further on, after checking with other cities where the IBGE_DU values are known such as Petrolina and Sao Paulo, we can confirm that the data is at least somewhat accurate.

Source: https://cidades.ibge.gov.br/brasil/pb/marcacao/pesquisa/23/25124?tipo=ranking&indicador=29522

From this, we can make a reasonable estimate

Brazil_cities_allpop[(is.na(Brazil_cities_allpop$IBGE_DU))!=0,]
##       CITY_STATE     CITY STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BRAS
## 2937 Marcação_PB Marcação    PB       0         7609              7609
## 5367 Uiramutã_RR Uiramutã    RR       0         8375              8375
##      IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL IBGE_POP IBGE_1
## 2937                 0      NA            NA             0     2838     45
## 5367                 0      NA            NA             0      794     19
##      IBGE_1-4 IBGE_5-9 IBGE_10-14 IBGE_15-59 IBGE_60+ IDHM Ranking 2010  IDHM
## 2937      211      277        266       1701      338              5404 0.529
## 5367       83      129        110        424       29              5561 0.453
##      IDHM_Renda IDHM_Longevidade IDHM_Educacao      LONG       LAT    ALT
## 2937      0.525            0.691         0.408 -35.01392 -6.770054  92.93
## 5367      0.439            0.766         0.276 -60.19572  4.585440 605.80
##         AREA     RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES GVA_PUBLIC
## 2937  123.83 Rural Adjacente    23738.38      1724.29     11192.86   37551.82
## 5367 8065.56    Rural Remoto     9864.83      1189.55         4.75      87.28
##       GVA_TOTAL    TAXES      GDP POP_GDP GDP_CAPITA
## 2937    74207.34 1436.36  75643.7    8475    8925.51
## 5367   103089.25    0.59 103680.3    9664   10728.51
##                                                                  GVA_MAIN
## 2937 Administração, defesa, educação e saúde públicas e seguridade social
## 5367 Administração, defesa, educação e saúde públicas e seguridade social
##      COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G COMP_H COMP_I
## 2937       36      2      0      2      0      0      0     15      1      1
## 5367        8      0      0      0      0      0      0      7      0      0
##      COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R COMP_S
## 2937      0      0      0      1      1      2      5      0      0      6
## 5367      0      0      0      0      0      1      0      0      0      0
##      COMP_U
## 2937      0
## 5367      0

3.1.4.11 Replacing missing values with externally sourced data

Brazil_cities_allpop$IBGE_DU[which(Brazil_cities_allpop$CITY_STATE == "Marcação_PB")] <- 2040
Brazil_cities_allpop$IBGE_DU_URBAN[which(Brazil_cities_allpop$CITY_STATE == "Marcação_PB")] <- 824
Brazil_cities_allpop$IBGE_DU_RURAL[which(Brazil_cities_allpop$CITY_STATE == "Marcação_PB")] <- 1216

Brazil_cities_allpop$IBGE_DU[which(Brazil_cities_allpop$CITY_STATE == "Uiramutã_RR")] <- 1444
Brazil_cities_allpop$IBGE_DU_URBAN[which(Brazil_cities_allpop$CITY_STATE == "Uiramutã_RR")] <- 219
Brazil_cities_allpop$IBGE_DU_RURAL[which(Brazil_cities_allpop$CITY_STATE == "Uiramutã_RR")] <- 1225

summary(Brazil_cities_allpop)
##                   CITY_STATE       CITY              STATE          
##  Abadia De Goiás_GO    :   1   Length:5565        Length:5565       
##  Abadia Dos Dourados_MG:   1   Class :character   Class :character  
##  Abadiânia_GO          :   1   Mode  :character   Mode  :character  
##  Abaeté_MG             :   1                                        
##  Abaetetuba_PA         :   1                                        
##  Abaiara_CE            :   1                                        
##  (Other)               :5559                                        
##     CAPITAL          IBGE_RES_POP      IBGE_RES_POP_BRAS  IBGE_RES_POP_ESTR 
##  Min.   :0.000000   Min.   :     805   Min.   :     805   Min.   :     0.0  
##  1st Qu.:0.000000   1st Qu.:    5235   1st Qu.:    5230   1st Qu.:     0.0  
##  Median :0.000000   Median :   10934   Median :   10926   Median :     0.0  
##  Mean   :0.004852   Mean   :   34278   Mean   :   34200   Mean   :    77.5  
##  3rd Qu.:0.000000   3rd Qu.:   23424   3rd Qu.:   23390   3rd Qu.:    10.0  
##  Max.   :1.000000   Max.   :11253503   Max.   :11133776   Max.   :119727.0  
##                                                                             
##     IBGE_DU        IBGE_DU_URBAN     IBGE_DU_RURAL      IBGE_POP       
##  Min.   :    239   Min.   :     60   Min.   :    0   Min.   :     174  
##  1st Qu.:   1572   1st Qu.:    874   1st Qu.:  472   1st Qu.:    2801  
##  Median :   3174   Median :   1844   Median :  919   Median :    6170  
##  Mean   :  10300   Mean   :   8856   Mean   : 1444   Mean   :   27595  
##  3rd Qu.:   6725   3rd Qu.:   4621   3rd Qu.: 1813   3rd Qu.:   15302  
##  Max.   :3576148   Max.   :3548433   Max.   :33809   Max.   :10463636  
##                                                                        
##      IBGE_1            IBGE_1-4         IBGE_5-9        IBGE_10-14    
##  Min.   :     0.0   Min.   :     5   Min.   :     7   Min.   :    12  
##  1st Qu.:    38.0   1st Qu.:   158   1st Qu.:   220   1st Qu.:   259  
##  Median :    92.0   Median :   376   Median :   516   Median :   588  
##  Mean   :   383.3   Mean   :  1544   Mean   :  2069   Mean   :  2381  
##  3rd Qu.:   232.0   3rd Qu.:   951   3rd Qu.:  1300   3rd Qu.:  1478  
##  Max.   :129464.0   Max.   :514794   Max.   :684443   Max.   :783702  
##                                                                       
##    IBGE_15-59         IBGE_60+       IDHM Ranking 2010      IDHM       
##  Min.   :     94   Min.   :     29   Min.   :   1      Min.   :0.4180  
##  1st Qu.:   1734   1st Qu.:    341   1st Qu.:1392      1st Qu.:0.5990  
##  Median :   3841   Median :    722   Median :2782      Median :0.6650  
##  Mean   :  18212   Mean   :   3004   Mean   :2783      Mean   :0.6592  
##  3rd Qu.:   9628   3rd Qu.:   1724   3rd Qu.:4173      3rd Qu.:0.7180  
##  Max.   :7058221   Max.   :1293012   Max.   :5565      Max.   :0.8620  
##                                      NA's   :1         NA's   :1       
##    IDHM_Renda     IDHM_Longevidade IDHM_Educacao         LONG       
##  Min.   :0.4000   Min.   :0.6720   Min.   :0.2070   Min.   :-72.92  
##  1st Qu.:0.5720   1st Qu.:0.7690   1st Qu.:0.4900   1st Qu.:-50.87  
##  Median :0.6540   Median :0.8080   Median :0.5600   Median :-46.52  
##  Mean   :0.6429   Mean   :0.8016   Mean   :0.5591   Mean   :-46.23  
##  3rd Qu.:0.7070   3rd Qu.:0.8360   3rd Qu.:0.6310   3rd Qu.:-41.41  
##  Max.   :0.8910   Max.   :0.8940   Max.   :0.8250   Max.   :-32.44  
##  NA's   :1        NA's   :1        NA's   :1        NA's   :2       
##       LAT               ALT                AREA           RURAL_URBAN       
##  Min.   :-33.688   Min.   :     0.0   Min.   :     3.57   Length:5565       
##  1st Qu.:-22.838   1st Qu.:   169.7   1st Qu.:   204.53   Class :character  
##  Median :-18.090   Median :   406.5   Median :   416.59   Mode  :character  
##  Mean   :-16.445   Mean   :   894.0   Mean   :  1515.39                     
##  3rd Qu.: -8.489   3rd Qu.:   629.0   3rd Qu.:  1025.73                     
##  Max.   :  4.585   Max.   :874579.0   Max.   :159533.33                     
##  NA's   :2         NA's   :2          NA's   :1                             
##   GVA_AGROPEC       GVA_INDUSTRY       GVA_SERVICES         GVA_PUBLIC      
##  Min.   :      0   Min.   :       1   Min.   :        2   Min.   :       7  
##  1st Qu.:   4193   1st Qu.:    1725   1st Qu.:    10113   1st Qu.:   17260  
##  Median :  20430   Median :    7425   Median :    31212   Median :   35809  
##  Mean   :  47263   Mean   :  176049   Mean   :   489855   Mean   :  123844  
##  3rd Qu.:  51238   3rd Qu.:   41011   3rd Qu.:   115521   3rd Qu.:   89316  
##  Max.   :1402282   Max.   :63306755   Max.   :464656988   Max.   :41902893  
##                                                                             
##    GVA_TOTAL             TAXES                GDP               POP_GDP        
##  Min.   :       17   Min.   :   -14159   Min.   :       15   Min.   :     815  
##  1st Qu.:    42254   1st Qu.:     1303   1st Qu.:    43706   1st Qu.:    5488  
##  Median :   119481   Median :     5107   Median :   125111   Median :   11584  
##  Mean   :   833592   Mean   :   118962   Mean   :   955266   Mean   :   37023  
##  3rd Qu.:   313988   3rd Qu.:    22209   3rd Qu.:   329717   3rd Qu.:   25102  
##  Max.   :569910503   Max.   :117125387   Max.   :687035890   Max.   :12038175  
##                                                                                
##    GDP_CAPITA       GVA_MAIN            COMP_TOT            COMP_A       
##  Min.   :  3191   Length:5565        Min.   :     6.0   Min.   :   0.00  
##  1st Qu.:  9062   Class :character   1st Qu.:    68.0   1st Qu.:   1.00  
##  Median : 15866   Mode  :character   Median :   162.0   Median :   2.00  
##  Mean   : 21119                      Mean   :   907.5   Mean   :  18.27  
##  3rd Qu.: 26155                      3rd Qu.:   449.0   3rd Qu.:   8.00  
##  Max.   :314638                      Max.   :530446.0   Max.   :1948.00  
##                                                                          
##      COMP_B            COMP_C            COMP_D             COMP_E      
##  Min.   :  0.000   Min.   :    0.0   Min.   :  0.0000   Min.   :  0.00  
##  1st Qu.:  0.000   1st Qu.:    3.0   1st Qu.:  0.0000   1st Qu.:  0.00  
##  Median :  0.000   Median :   11.0   Median :  0.0000   Median :  0.00  
##  Mean   :  1.853   Mean   :   73.5   Mean   :  0.4264   Mean   :  2.03  
##  3rd Qu.:  2.000   3rd Qu.:   39.0   3rd Qu.:  0.0000   3rd Qu.:  1.00  
##  Max.   :274.000   Max.   :31566.0   Max.   :332.0000   Max.   :657.00  
##                                                                         
##      COMP_F             COMP_G             COMP_H             COMP_I        
##  Min.   :    0.00   Min.   :     1.0   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    1.00   1st Qu.:    32.0   1st Qu.:    1.00   1st Qu.:    2.00  
##  Median :    4.00   Median :    75.0   Median :    7.00   Median :    7.00  
##  Mean   :   43.29   Mean   :   348.2   Mean   :   41.02   Mean   :   55.92  
##  3rd Qu.:   15.00   3rd Qu.:   200.0   3rd Qu.:   25.00   3rd Qu.:   24.00  
##  Max.   :25222.00   Max.   :150633.0   Max.   :19515.00   Max.   :29290.00  
##                                                                             
##      COMP_J             COMP_K             COMP_L             COMP_M        
##  Min.   :    0.00   Min.   :    0.00   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    0.00   1st Qu.:    0.00   1st Qu.:    0.00   1st Qu.:    1.00  
##  Median :    1.00   Median :    0.00   Median :    0.00   Median :    4.00  
##  Mean   :   24.76   Mean   :   15.56   Mean   :   15.15   Mean   :   51.34  
##  3rd Qu.:    5.00   3rd Qu.:    2.00   3rd Qu.:    3.00   3rd Qu.:   13.00  
##  Max.   :38720.00   Max.   :23738.00   Max.   :14003.00   Max.   :49181.00  
##                                                                             
##      COMP_N             COMP_O            COMP_P             COMP_Q        
##  Min.   :    0.00   Min.   :  1.000   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    1.00   1st Qu.:  2.000   1st Qu.:    2.00   1st Qu.:    1.00  
##  Median :    4.00   Median :  2.000   Median :    6.00   Median :    3.00  
##  Mean   :   83.77   Mean   :  3.271   Mean   :   30.98   Mean   :   34.18  
##  3rd Qu.:   14.00   3rd Qu.:  3.000   3rd Qu.:   17.00   3rd Qu.:   12.00  
##  Max.   :76757.00   Max.   :204.000   Max.   :16030.00   Max.   :22248.00  
##                                                                            
##      COMP_R            COMP_S             COMP_U         
##  Min.   :   0.00   Min.   :    0.00   Min.   :  0.00000  
##  1st Qu.:   0.00   1st Qu.:    5.00   1st Qu.:  0.00000  
##  Median :   2.00   Median :   12.00   Median :  0.00000  
##  Mean   :  12.19   Mean   :   51.65   Mean   :  0.05031  
##  3rd Qu.:   6.00   3rd Qu.:   31.00   3rd Qu.:  0.00000  
##  Max.   :6687.00   Max.   :24832.00   Max.   :123.00000  
## 

3.1.4.12 Investigating missing LONG, LAT and ALT values

Brazil_cities_allpop[(is.na(Brazil_cities_allpop$LONG))!=0,]
##              CITY_STATE            CITY STATE CAPITAL IBGE_RES_POP
## 3806 Pinhal Da Serra_RS Pinhal Da Serra    RS       0         2130
## 4490 Santa Terezinha_BA Santa Terezinha    BA       0         9648
##      IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL
## 3806              2130                 0     745           180           565
## 4490              9648                 0    2891           734          2157
##      IBGE_POP IBGE_1 IBGE_1-4 IBGE_5-9 IBGE_10-14 IBGE_15-59 IBGE_60+
## 3806      478     11       22       34         32        312       67
## 4490     2332     40      126      191        217       1419      339
##      IDHM Ranking 2010 IDHM IDHM_Renda IDHM_Longevidade IDHM_Educacao LONG LAT
## 3806              3121 0.65      0.641            0.835         0.513   NA  NA
## 4490                NA   NA         NA               NA            NA   NA  NA
##      ALT   AREA     RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES
## 3806  NA 438.11 Rural Adjacente     56030.9    267670.32        15.85
## 4490  NA 719.26 Rural Adjacente     13235.2      5398.61     17754.37
##      GVA_PUBLIC  GVA_TOTAL     TAXES       GDP POP_GDP GDP_CAPITA
## 3806   19831.52      359.38 25222.60 384602.56    2115  181845.18
## 4490   32630.97    69019.14  3149.33  72168.48   10619    6796.16
##                                                                                  GVA_MAIN
## 3806 Eletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação
## 4490                 Administração, defesa, educação e saúde públicas e seguridade social
##      COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G COMP_H COMP_I
## 3806       45      1      0      2      1      1      3     23      2      4
## 4490       74      2      1      4      0      0      3     37      0      3
##      COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R COMP_S
## 3806      0      0      0      0      1      2      1      1      0      3
## 4490      1      0      0      1      2      2     12      2      0      4
##      COMP_U
## 3806      0
## 4490      0

3.1.4.13 Replacing Missing Latitude and Longitude values

Source: https://www.latlong.net/ Source: https://www.freemaptools.com/elevation-finder.htm

Brazil_cities_allpop$LONG[which(Brazil_cities_allpop$CITY_STATE == "Pinhal Da Serra_RS")] <- -51.171909
Brazil_cities_allpop$LAT[which(Brazil_cities_allpop$CITY_STATE == "Pinhal Da Serra_RS")] <- -27.874420
Brazil_cities_allpop$ALT[which(Brazil_cities_allpop$CITY_STATE == "Pinhal Da Serra_RS")] <- 918

Brazil_cities_allpop$LONG[which(Brazil_cities_allpop$CITY_STATE == "Santa Terezinha_BA")] <- -39.5184
Brazil_cities_allpop$LAT[which(Brazil_cities_allpop$CITY_STATE == "Santa Terezinha_BA")] <- -12.7498
Brazil_cities_allpop$ALT[which(Brazil_cities_allpop$CITY_STATE == "Santa Terezinha_BA")] <- 210

3.1.4.14 Replacing Missing Area Values

Source: https://en.wikipedia.org/wiki/Japur%C3%A1

Brazil_cities_allpop[(is.na(Brazil_cities_allpop$AREA))!=0,]
##      CITY_STATE   CITY STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BRAS
## 2531  Japurá_AM Japurá    AM       0         7326              7318
##      IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL IBGE_POP IBGE_1
## 2531                 8    1043           583           460     3235     92
##      IBGE_1-4 IBGE_5-9 IBGE_10-14 IBGE_15-59 IBGE_60+ IDHM Ranking 2010  IDHM
## 2531      369      435        478       1764       97              5451 0.522
##      IDHM_Renda IDHM_Longevidade IDHM_Educacao     LONG       LAT   ALT AREA
## 2531      0.552            0.748         0.345 -66.9969 -1.880845 69.84   NA
##       RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES GVA_PUBLIC  GVA_TOTAL 
## 2531 Rural Remoto    16398.64       2146.9      9908.92    29244.3        57.7
##        TAXES   GDP POP_GDP GDP_CAPITA
## 2531 1489.89 59.19    4660   12701.43
##                                                                  GVA_MAIN
## 2531 Administração, defesa, educação e saúde públicas e seguridade social
##      COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G COMP_H COMP_I
## 2531       16      0      0      0      0      0      0     13      0      0
##      COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R COMP_S
## 2531      0      0      0      0      1      2      0      0      0      0
##      COMP_U
## 2531      0
Brazil_cities_allpop$AREA[which(Brazil_cities_allpop$CITY_STATE == "Japurá_AM")] <- 55791
summary(Brazil_cities_allpop)
##                   CITY_STATE       CITY              STATE          
##  Abadia De Goiás_GO    :   1   Length:5565        Length:5565       
##  Abadia Dos Dourados_MG:   1   Class :character   Class :character  
##  Abadiânia_GO          :   1   Mode  :character   Mode  :character  
##  Abaeté_MG             :   1                                        
##  Abaetetuba_PA         :   1                                        
##  Abaiara_CE            :   1                                        
##  (Other)               :5559                                        
##     CAPITAL          IBGE_RES_POP      IBGE_RES_POP_BRAS  IBGE_RES_POP_ESTR 
##  Min.   :0.000000   Min.   :     805   Min.   :     805   Min.   :     0.0  
##  1st Qu.:0.000000   1st Qu.:    5235   1st Qu.:    5230   1st Qu.:     0.0  
##  Median :0.000000   Median :   10934   Median :   10926   Median :     0.0  
##  Mean   :0.004852   Mean   :   34278   Mean   :   34200   Mean   :    77.5  
##  3rd Qu.:0.000000   3rd Qu.:   23424   3rd Qu.:   23390   3rd Qu.:    10.0  
##  Max.   :1.000000   Max.   :11253503   Max.   :11133776   Max.   :119727.0  
##                                                                             
##     IBGE_DU        IBGE_DU_URBAN     IBGE_DU_RURAL      IBGE_POP       
##  Min.   :    239   Min.   :     60   Min.   :    0   Min.   :     174  
##  1st Qu.:   1572   1st Qu.:    874   1st Qu.:  472   1st Qu.:    2801  
##  Median :   3174   Median :   1844   Median :  919   Median :    6170  
##  Mean   :  10300   Mean   :   8856   Mean   : 1444   Mean   :   27595  
##  3rd Qu.:   6725   3rd Qu.:   4621   3rd Qu.: 1813   3rd Qu.:   15302  
##  Max.   :3576148   Max.   :3548433   Max.   :33809   Max.   :10463636  
##                                                                        
##      IBGE_1            IBGE_1-4         IBGE_5-9        IBGE_10-14    
##  Min.   :     0.0   Min.   :     5   Min.   :     7   Min.   :    12  
##  1st Qu.:    38.0   1st Qu.:   158   1st Qu.:   220   1st Qu.:   259  
##  Median :    92.0   Median :   376   Median :   516   Median :   588  
##  Mean   :   383.3   Mean   :  1544   Mean   :  2069   Mean   :  2381  
##  3rd Qu.:   232.0   3rd Qu.:   951   3rd Qu.:  1300   3rd Qu.:  1478  
##  Max.   :129464.0   Max.   :514794   Max.   :684443   Max.   :783702  
##                                                                       
##    IBGE_15-59         IBGE_60+       IDHM Ranking 2010      IDHM       
##  Min.   :     94   Min.   :     29   Min.   :   1      Min.   :0.4180  
##  1st Qu.:   1734   1st Qu.:    341   1st Qu.:1392      1st Qu.:0.5990  
##  Median :   3841   Median :    722   Median :2782      Median :0.6650  
##  Mean   :  18212   Mean   :   3004   Mean   :2783      Mean   :0.6592  
##  3rd Qu.:   9628   3rd Qu.:   1724   3rd Qu.:4173      3rd Qu.:0.7180  
##  Max.   :7058221   Max.   :1293012   Max.   :5565      Max.   :0.8620  
##                                      NA's   :1         NA's   :1       
##    IDHM_Renda     IDHM_Longevidade IDHM_Educacao         LONG       
##  Min.   :0.4000   Min.   :0.6720   Min.   :0.2070   Min.   :-72.92  
##  1st Qu.:0.5720   1st Qu.:0.7690   1st Qu.:0.4900   1st Qu.:-50.87  
##  Median :0.6540   Median :0.8080   Median :0.5600   Median :-46.52  
##  Mean   :0.6429   Mean   :0.8016   Mean   :0.5591   Mean   :-46.23  
##  3rd Qu.:0.7070   3rd Qu.:0.8360   3rd Qu.:0.6310   3rd Qu.:-41.41  
##  Max.   :0.8910   Max.   :0.8940   Max.   :0.8250   Max.   :-32.44  
##  NA's   :1        NA's   :1        NA's   :1                        
##       LAT               ALT                AREA           RURAL_URBAN       
##  Min.   :-33.688   Min.   :     0.0   Min.   :     3.57   Length:5565       
##  1st Qu.:-22.839   1st Qu.:   169.9   1st Qu.:   204.56   Class :character  
##  Median :-18.090   Median :   406.5   Median :   417.26   Mode  :character  
##  Mean   :-16.446   Mean   :   893.9   Mean   :  1525.15                     
##  3rd Qu.: -8.490   3rd Qu.:   629.1   3rd Qu.:  1026.38                     
##  Max.   :  4.585   Max.   :874579.0   Max.   :159533.33                     
##                                                                             
##   GVA_AGROPEC       GVA_INDUSTRY       GVA_SERVICES         GVA_PUBLIC      
##  Min.   :      0   Min.   :       1   Min.   :        2   Min.   :       7  
##  1st Qu.:   4193   1st Qu.:    1725   1st Qu.:    10113   1st Qu.:   17260  
##  Median :  20430   Median :    7425   Median :    31212   Median :   35809  
##  Mean   :  47263   Mean   :  176049   Mean   :   489855   Mean   :  123844  
##  3rd Qu.:  51238   3rd Qu.:   41011   3rd Qu.:   115521   3rd Qu.:   89316  
##  Max.   :1402282   Max.   :63306755   Max.   :464656988   Max.   :41902893  
##                                                                             
##    GVA_TOTAL             TAXES                GDP               POP_GDP        
##  Min.   :       17   Min.   :   -14159   Min.   :       15   Min.   :     815  
##  1st Qu.:    42254   1st Qu.:     1303   1st Qu.:    43706   1st Qu.:    5488  
##  Median :   119481   Median :     5107   Median :   125111   Median :   11584  
##  Mean   :   833592   Mean   :   118962   Mean   :   955266   Mean   :   37023  
##  3rd Qu.:   313988   3rd Qu.:    22209   3rd Qu.:   329717   3rd Qu.:   25102  
##  Max.   :569910503   Max.   :117125387   Max.   :687035890   Max.   :12038175  
##                                                                                
##    GDP_CAPITA       GVA_MAIN            COMP_TOT            COMP_A       
##  Min.   :  3191   Length:5565        Min.   :     6.0   Min.   :   0.00  
##  1st Qu.:  9062   Class :character   1st Qu.:    68.0   1st Qu.:   1.00  
##  Median : 15866   Mode  :character   Median :   162.0   Median :   2.00  
##  Mean   : 21119                      Mean   :   907.5   Mean   :  18.27  
##  3rd Qu.: 26155                      3rd Qu.:   449.0   3rd Qu.:   8.00  
##  Max.   :314638                      Max.   :530446.0   Max.   :1948.00  
##                                                                          
##      COMP_B            COMP_C            COMP_D             COMP_E      
##  Min.   :  0.000   Min.   :    0.0   Min.   :  0.0000   Min.   :  0.00  
##  1st Qu.:  0.000   1st Qu.:    3.0   1st Qu.:  0.0000   1st Qu.:  0.00  
##  Median :  0.000   Median :   11.0   Median :  0.0000   Median :  0.00  
##  Mean   :  1.853   Mean   :   73.5   Mean   :  0.4264   Mean   :  2.03  
##  3rd Qu.:  2.000   3rd Qu.:   39.0   3rd Qu.:  0.0000   3rd Qu.:  1.00  
##  Max.   :274.000   Max.   :31566.0   Max.   :332.0000   Max.   :657.00  
##                                                                         
##      COMP_F             COMP_G             COMP_H             COMP_I        
##  Min.   :    0.00   Min.   :     1.0   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    1.00   1st Qu.:    32.0   1st Qu.:    1.00   1st Qu.:    2.00  
##  Median :    4.00   Median :    75.0   Median :    7.00   Median :    7.00  
##  Mean   :   43.29   Mean   :   348.2   Mean   :   41.02   Mean   :   55.92  
##  3rd Qu.:   15.00   3rd Qu.:   200.0   3rd Qu.:   25.00   3rd Qu.:   24.00  
##  Max.   :25222.00   Max.   :150633.0   Max.   :19515.00   Max.   :29290.00  
##                                                                             
##      COMP_J             COMP_K             COMP_L             COMP_M        
##  Min.   :    0.00   Min.   :    0.00   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    0.00   1st Qu.:    0.00   1st Qu.:    0.00   1st Qu.:    1.00  
##  Median :    1.00   Median :    0.00   Median :    0.00   Median :    4.00  
##  Mean   :   24.76   Mean   :   15.56   Mean   :   15.15   Mean   :   51.34  
##  3rd Qu.:    5.00   3rd Qu.:    2.00   3rd Qu.:    3.00   3rd Qu.:   13.00  
##  Max.   :38720.00   Max.   :23738.00   Max.   :14003.00   Max.   :49181.00  
##                                                                             
##      COMP_N             COMP_O            COMP_P             COMP_Q        
##  Min.   :    0.00   Min.   :  1.000   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    1.00   1st Qu.:  2.000   1st Qu.:    2.00   1st Qu.:    1.00  
##  Median :    4.00   Median :  2.000   Median :    6.00   Median :    3.00  
##  Mean   :   83.77   Mean   :  3.271   Mean   :   30.98   Mean   :   34.18  
##  3rd Qu.:   14.00   3rd Qu.:  3.000   3rd Qu.:   17.00   3rd Qu.:   12.00  
##  Max.   :76757.00   Max.   :204.000   Max.   :16030.00   Max.   :22248.00  
##                                                                            
##      COMP_R            COMP_S             COMP_U         
##  Min.   :   0.00   Min.   :    0.00   Min.   :  0.00000  
##  1st Qu.:   0.00   1st Qu.:    5.00   1st Qu.:  0.00000  
##  Median :   2.00   Median :   12.00   Median :  0.00000  
##  Mean   :  12.19   Mean   :   51.65   Mean   :  0.05031  
##  3rd Qu.:   6.00   3rd Qu.:   31.00   3rd Qu.:  0.00000  
##  Max.   :6687.00   Max.   :24832.00   Max.   :123.00000  
## 

3.1.4.15 Finding missing Santa Terezinha_BA Values

Brazil_cities_allpop[(is.na(Brazil_cities_allpop$IDHM))!=0,]
##              CITY_STATE            CITY STATE CAPITAL IBGE_RES_POP
## 4490 Santa Terezinha_BA Santa Terezinha    BA       0         9648
##      IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL
## 4490              9648                 0    2891           734          2157
##      IBGE_POP IBGE_1 IBGE_1-4 IBGE_5-9 IBGE_10-14 IBGE_15-59 IBGE_60+
## 4490     2332     40      126      191        217       1419      339
##      IDHM Ranking 2010 IDHM IDHM_Renda IDHM_Longevidade IDHM_Educacao     LONG
## 4490                NA   NA         NA               NA            NA -39.5184
##           LAT ALT   AREA     RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES
## 4490 -12.7498 210 719.26 Rural Adjacente     13235.2      5398.61     17754.37
##      GVA_PUBLIC  GVA_TOTAL    TAXES      GDP POP_GDP GDP_CAPITA
## 4490   32630.97    69019.14 3149.33 72168.48   10619    6796.16
##                                                                  GVA_MAIN
## 4490 Administração, defesa, educação e saúde públicas e seguridade social
##      COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G COMP_H COMP_I
## 4490       74      2      1      4      0      0      3     37      0      3
##      COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R COMP_S
## 4490      1      0      0      1      2      2     12      2      0      4
##      COMP_U
## 4490      0

Unfortunately, we will not be able to use this datapoint as we are unable to replace the remaining missing data values for the Human Development Indexes. For the purpose of this study, this datavalue will also be excluded

Brazil_cities_cleaned<- Brazil_cities_allpop[(is.na(Brazil_cities_allpop$IDHM))==0,]
summary(Brazil_cities_cleaned)
##                   CITY_STATE       CITY              STATE          
##  Abadia De Goiás_GO    :   1   Length:5564        Length:5564       
##  Abadia Dos Dourados_MG:   1   Class :character   Class :character  
##  Abadiânia_GO          :   1   Mode  :character   Mode  :character  
##  Abaeté_MG             :   1                                        
##  Abaetetuba_PA         :   1                                        
##  Abaiara_CE            :   1                                        
##  (Other)               :5558                                        
##     CAPITAL          IBGE_RES_POP      IBGE_RES_POP_BRAS  IBGE_RES_POP_ESTR  
##  Min.   :0.000000   Min.   :     805   Min.   :     805   Min.   :     0.00  
##  1st Qu.:0.000000   1st Qu.:    5234   1st Qu.:    5228   1st Qu.:     0.00  
##  Median :0.000000   Median :   10935   Median :   10930   Median :     0.00  
##  Mean   :0.004853   Mean   :   34282   Mean   :   34205   Mean   :    77.52  
##  3rd Qu.:0.000000   3rd Qu.:   23446   3rd Qu.:   23392   3rd Qu.:    10.00  
##  Max.   :1.000000   Max.   :11253503   Max.   :11133776   Max.   :119727.00  
##                                                                              
##     IBGE_DU        IBGE_DU_URBAN     IBGE_DU_RURAL        IBGE_POP       
##  Min.   :    239   Min.   :     60   Min.   :    0.0   Min.   :     174  
##  1st Qu.:   1572   1st Qu.:    874   1st Qu.:  471.8   1st Qu.:    2802  
##  Median :   3174   Median :   1845   Median :  918.5   Median :    6174  
##  Mean   :  10301   Mean   :   8857   Mean   : 1443.8   Mean   :   27599  
##  3rd Qu.:   6726   3rd Qu.:   4622   3rd Qu.: 1813.0   3rd Qu.:   15303  
##  Max.   :3576148   Max.   :3548433   Max.   :33809.0   Max.   :10463636  
##                                                                          
##      IBGE_1            IBGE_1-4           IBGE_5-9        IBGE_10-14      
##  Min.   :     0.0   Min.   :     5.0   Min.   :     7   Min.   :    12.0  
##  1st Qu.:    38.0   1st Qu.:   158.0   1st Qu.:   220   1st Qu.:   259.8  
##  Median :    92.0   Median :   376.5   Median :   516   Median :   588.5  
##  Mean   :   383.3   Mean   :  1544.8   Mean   :  2070   Mean   :  2381.8  
##  3rd Qu.:   232.0   3rd Qu.:   951.2   3rd Qu.:  1300   3rd Qu.:  1478.2  
##  Max.   :129464.0   Max.   :514794.0   Max.   :684443   Max.   :783702.0  
##                                                                           
##    IBGE_15-59         IBGE_60+         IDHM Ranking 2010      IDHM       
##  Min.   :     94   Min.   :     29.0   Min.   :   1      Min.   :0.4180  
##  1st Qu.:   1735   1st Qu.:    341.0   1st Qu.:1392      1st Qu.:0.5990  
##  Median :   3842   Median :    722.5   Median :2782      Median :0.6650  
##  Mean   :  18215   Mean   :   3004.7   Mean   :2783      Mean   :0.6592  
##  3rd Qu.:   9629   3rd Qu.:   1724.2   3rd Qu.:4173      3rd Qu.:0.7180  
##  Max.   :7058221   Max.   :1293012.0   Max.   :5565      Max.   :0.8620  
##                                                                          
##    IDHM_Renda     IDHM_Longevidade IDHM_Educacao         LONG       
##  Min.   :0.4000   Min.   :0.6720   Min.   :0.2070   Min.   :-72.92  
##  1st Qu.:0.5720   1st Qu.:0.7690   1st Qu.:0.4900   1st Qu.:-50.87  
##  Median :0.6540   Median :0.8080   Median :0.5600   Median :-46.52  
##  Mean   :0.6429   Mean   :0.8016   Mean   :0.5591   Mean   :-46.23  
##  3rd Qu.:0.7070   3rd Qu.:0.8360   3rd Qu.:0.6310   3rd Qu.:-41.41  
##  Max.   :0.8910   Max.   :0.8940   Max.   :0.8250   Max.   :-32.44  
##                                                                     
##       LAT               ALT                AREA           RURAL_URBAN       
##  Min.   :-33.688   Min.   :     0.0   Min.   :     3.57   Length:5564       
##  1st Qu.:-22.839   1st Qu.:   169.8   1st Qu.:   204.53   Class :character  
##  Median :-18.091   Median :   406.5   Median :   416.59   Mode  :character  
##  Mean   :-16.447   Mean   :   894.0   Mean   :  1525.29                     
##  3rd Qu.: -8.489   3rd Qu.:   629.1   3rd Qu.:  1026.44                     
##  Max.   :  4.585   Max.   :874579.0   Max.   :159533.33                     
##                                                                             
##   GVA_AGROPEC       GVA_INDUSTRY       GVA_SERVICES         GVA_PUBLIC      
##  Min.   :      0   Min.   :       1   Min.   :        2   Min.   :       7  
##  1st Qu.:   4192   1st Qu.:    1725   1st Qu.:    10113   1st Qu.:   17258  
##  Median :  20432   Median :    7428   Median :    31214   Median :   35837  
##  Mean   :  47270   Mean   :  176080   Mean   :   489940   Mean   :  123860  
##  3rd Qu.:  51239   3rd Qu.:   41015   3rd Qu.:   115552   3rd Qu.:   89328  
##  Max.   :1402282   Max.   :63306755   Max.   :464656988   Max.   :41902893  
##                                                                             
##    GVA_TOTAL             TAXES                GDP               POP_GDP        
##  Min.   :       17   Min.   :   -14159   Min.   :       15   Min.   :     815  
##  1st Qu.:    42254   1st Qu.:     1302   1st Qu.:    43691   1st Qu.:    5486  
##  Median :   119492   Median :     5108   Median :   125153   Median :   11584  
##  Mean   :   833729   Mean   :   118983   Mean   :   955425   Mean   :   37028  
##  3rd Qu.:   314039   3rd Qu.:    22219   3rd Qu.:   329733   3rd Qu.:   25105  
##  Max.   :569910503   Max.   :117125387   Max.   :687035890   Max.   :12038175  
##                                                                                
##    GDP_CAPITA       GVA_MAIN            COMP_TOT            COMP_A       
##  Min.   :  3191   Length:5564        Min.   :     6.0   Min.   :   0.00  
##  1st Qu.:  9062   Class :character   1st Qu.:    68.0   1st Qu.:   1.00  
##  Median : 15870   Mode  :character   Median :   162.0   Median :   2.00  
##  Mean   : 21122                      Mean   :   907.6   Mean   :  18.27  
##  3rd Qu.: 26155                      3rd Qu.:   449.2   3rd Qu.:   8.00  
##  Max.   :314638                      Max.   :530446.0   Max.   :1948.00  
##                                                                          
##      COMP_B            COMP_C             COMP_D             COMP_E       
##  Min.   :  0.000   Min.   :    0.00   Min.   :  0.0000   Min.   :  0.000  
##  1st Qu.:  0.000   1st Qu.:    3.00   1st Qu.:  0.0000   1st Qu.:  0.000  
##  Median :  0.000   Median :   11.00   Median :  0.0000   Median :  0.000  
##  Mean   :  1.853   Mean   :   73.51   Mean   :  0.4265   Mean   :  2.031  
##  3rd Qu.:  2.000   3rd Qu.:   39.00   3rd Qu.:  0.0000   3rd Qu.:  1.000  
##  Max.   :274.000   Max.   :31566.00   Max.   :332.0000   Max.   :657.000  
##                                                                           
##      COMP_F             COMP_G             COMP_H             COMP_I        
##  Min.   :    0.00   Min.   :     1.0   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    1.00   1st Qu.:    32.0   1st Qu.:    1.00   1st Qu.:    2.00  
##  Median :    4.00   Median :    75.0   Median :    7.00   Median :    7.00  
##  Mean   :   43.29   Mean   :   348.3   Mean   :   41.03   Mean   :   55.93  
##  3rd Qu.:   15.00   3rd Qu.:   200.0   3rd Qu.:   25.00   3rd Qu.:   24.00  
##  Max.   :25222.00   Max.   :150633.0   Max.   :19515.00   Max.   :29290.00  
##                                                                             
##      COMP_J             COMP_K             COMP_L             COMP_M        
##  Min.   :    0.00   Min.   :    0.00   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    0.00   1st Qu.:    0.00   1st Qu.:    0.00   1st Qu.:    1.00  
##  Median :    1.00   Median :    0.00   Median :    0.00   Median :    4.00  
##  Mean   :   24.77   Mean   :   15.57   Mean   :   15.15   Mean   :   51.34  
##  3rd Qu.:    5.00   3rd Qu.:    2.00   3rd Qu.:    3.00   3rd Qu.:   13.00  
##  Max.   :38720.00   Max.   :23738.00   Max.   :14003.00   Max.   :49181.00  
##                                                                             
##      COMP_N             COMP_O            COMP_P             COMP_Q        
##  Min.   :    0.00   Min.   :  1.000   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    1.00   1st Qu.:  2.000   1st Qu.:    2.00   1st Qu.:    1.00  
##  Median :    4.00   Median :  2.000   Median :    6.00   Median :    3.00  
##  Mean   :   83.78   Mean   :  3.271   Mean   :   30.98   Mean   :   34.18  
##  3rd Qu.:   14.00   3rd Qu.:  3.000   3rd Qu.:   17.00   3rd Qu.:   12.00  
##  Max.   :76757.00   Max.   :204.000   Max.   :16030.00   Max.   :22248.00  
##                                                                            
##      COMP_R            COMP_S             COMP_U         
##  Min.   :   0.00   Min.   :    0.00   Min.   :  0.00000  
##  1st Qu.:   0.00   1st Qu.:    5.00   1st Qu.:  0.00000  
##  Median :   2.00   Median :   12.00   Median :  0.00000  
##  Mean   :  12.19   Mean   :   51.66   Mean   :  0.05032  
##  3rd Qu.:   6.00   3rd Qu.:   31.00   3rd Qu.:  0.00000  
##  Max.   :6687.00   Max.   :24832.00   Max.   :123.00000  
## 

3.1.4.16 Summary of Data Cleaning

Overall we had lost a total of 9 rows of data during the data cleaning. 3 of which were missing depedent variable of GDP per Capita, 5 of which were missing a large number of variables and lastly 1 due to missing IDHM values.

Overall we reduced our number of variables from 81 to 59. We added 1 variable as a unique identifer for each state, removed 22 variables due to the collection of data recorded after our dependent variable (2016) and removed 1 variable due to a large portion of missing values for each row.

3.1.5 Data Processing

In order to formulate our indicators, we will need to create some derived variables to ensure that our indicators for our explainatory model are not correlated with one another or the dependent variable by some underlying issue. Since our dependent variable is a metric which is divided by population, we would need to process values which are dependant on population in some ways.

We will be taking 3 different approaches in this case.

  1. Using Ratios rather than counts for metrics where we have totals. E.g. (foreign resident population / total resident population)

  2. Using the values divided by POP_GDP which is the population scale used to formulate GDP Per capita.

3.1.5.1 Categorical Data Handling

We can derive more variables for our analysis by converting categorical variables into binary arrays. This will allow us to retain our categorical variables during our regression by making them into dummy variables.

Examining GVA_MAIN

unique(Brazil_cities_cleaned[,37])
##  [1] "Demais serviços"                                                                     
##  [2] "Administração, defesa, educação e saúde públicas e seguridade social"                
##  [3] "Agricultura, inclusive apoio à agricultura e a pós colheita"                         
##  [4] "Indústrias de transformação"                                                         
##  [5] "Pecuária, inclusive apoio à pecuária"                                                
##  [6] "Eletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação"
##  [7] "Comércio e reparação de veículos automotores e motocicletas"                         
##  [8] "Indústrias extrativas"                                                               
##  [9] "Construção"                                                                          
## [10] "Produção florestal, pesca e aquicultura"

Examining RURAL_URBAN

unique(Brazil_cities_cleaned[,27])
## [1] "Urbano"                  "Rural Adjacente"        
## [3] "Rural Remoto"            "Intermediário Adjacente"
## [5] "Intermediário Remoto"

Creating Dummy Variable Arrays

Brazil_cities_CAT <- cbind(Brazil_cities_cleaned, as.data.frame(with(Brazil_cities_cleaned, model.matrix(~ RURAL_URBAN + 0))))
Brazil_cities_CAT <- cbind(Brazil_cities_CAT, as.data.frame(with(Brazil_cities_cleaned, model.matrix(~ GVA_MAIN + 0))))

Dropping Categorical Columns

dropCategorical <- c("GVA_MAIN", "RURAL_URBAN")

Brazil_cities_withDummy <- Brazil_cities_CAT[ , !(names(Brazil_cities_CAT) %in% dropCategorical)]

3.1.5.2 Building Multiple Ratios

In order to control for populational differences, we can take ratios instead of pure counts to get a better understanding of the makeup of each town

3.1.5.2.1 Reworking GVA Totals

After examining the data and source of the data. There appears to be an error in the GVA totals. This would greatly affect our ratios for GVA and upon inspection of the source data, all other GVA values are correct except the totals. It is not clear where the values in the totals are coming from, as such we will replace them by summing up all the values for each category of GVA to formulate new GVA totals.

Brazil_cities_withDummy <-  Brazil_cities_withDummy %>%
   mutate(` GVA_TOTAL ` = as.numeric(rowSums(.[27:30])))
Brazil_cities_Derived <- Brazil_cities_withDummy %>%
  # Foregin vs Local population
  mutate(RES_BRAZ_POP_RATIO = ifelse((IBGE_RES_POP_BRAS == 0), 0, (IBGE_RES_POP_BRAS/IBGE_RES_POP))) %>%
  mutate(RES_FOREIGN_POP_RATIO = ifelse((IBGE_RES_POP_ESTR == 0), 0, (IBGE_RES_POP_ESTR/IBGE_RES_POP))) %>%
  # Rural vs Urban Domestic Units
  mutate(DOM_URBAN_RATIO = ifelse((IBGE_DU_URBAN == 0), 0, (IBGE_DU_URBAN/IBGE_DU)))%>%
  mutate(DOM_RURAL_RATIO = ifelse((IBGE_DU_RURAL == 0), 0, (IBGE_DU_RURAL/IBGE_DU)))%>%
  # Residential Population Age Ratios
  mutate(POP_BEL_ONE_RATIO = ifelse((IBGE_1 == 0), 0, (IBGE_1/IBGE_POP)))%>%
  mutate(POP_ONE_to_FOUR_RATIO = ifelse((`IBGE_1-4` == 0), 0, (`IBGE_1-4`/IBGE_POP)))%>%
  mutate(POP_FIVE_to_NINE_RATIO = ifelse((`IBGE_5-9` == 0), 0, (`IBGE_5-9`/IBGE_POP)))%>%
  mutate(POP_TEN_to_FOURTEEN_RATIO = ifelse((`IBGE_10-14` == 0), 0, (`IBGE_10-14`/IBGE_POP)))%>%
  mutate(POP_WORKING_RATIO = ifelse((`IBGE_15-59` == 0), 0, (`IBGE_15-59`/IBGE_POP))) %>%
  mutate(POP_ELDERLY_RATIO = ifelse((`IBGE_60+` == 0), 0, (`IBGE_60+`/IBGE_POP)))%>%
  # Gross Added Value Ratios
  mutate(GVA_AGROPEC_RATIO = ifelse((GVA_AGROPEC == 0), 0, (GVA_AGROPEC/as.numeric(` GVA_TOTAL `))))%>%
  mutate(GVA_INDUSTRY_RATIO = ifelse((GVA_INDUSTRY == 0), 0, (GVA_INDUSTRY/as.numeric(` GVA_TOTAL `))))%>%
  mutate(GVA_SERVICES_RATIO = ifelse((GVA_SERVICES == 0), 0, (GVA_SERVICES/as.numeric(` GVA_TOTAL `))))%>%
  mutate(GVA_PUBLIC_RATIO = ifelse((GVA_PUBLIC == 0), 0, (GVA_PUBLIC/as.numeric(` GVA_TOTAL `))))%>%
  # Company Ratios
  mutate(COM_A_RATIO = ifelse((COMP_A == 0), 0, (COMP_A/COMP_TOT)))%>%
  mutate(COM_B_RATIO = ifelse((COMP_B == 0), 0, (COMP_B/COMP_TOT)))%>%
  mutate(COM_C_RATIO = ifelse((COMP_C == 0), 0, (COMP_C/COMP_TOT)))%>%
  mutate(COM_D_RATIO = ifelse((COMP_D == 0), 0, (COMP_D/COMP_TOT)))%>%
  mutate(COM_E_RATIO = ifelse((COMP_E == 0), 0, (COMP_E/COMP_TOT)))%>%
  mutate(COM_F_RATIO = ifelse((COMP_F == 0), 0, (COMP_F/COMP_TOT)))%>%
  mutate(COM_G_RATIO = ifelse((COMP_G == 0), 0, (COMP_G/COMP_TOT)))%>%
  mutate(COM_H_RATIO = ifelse((COMP_H == 0), 0, (COMP_H/COMP_TOT)))%>%
  mutate(COM_I_RATIO = ifelse((COMP_I == 0), 0, (COMP_I/COMP_TOT)))%>%
  mutate(COM_J_RATIO = ifelse((COMP_J == 0), 0, (COMP_J/COMP_TOT)))%>%
  mutate(COM_K_RATIO = ifelse((COMP_K == 0), 0, (COMP_K/COMP_TOT)))%>%
  mutate(COM_L_RATIO = ifelse((COMP_L == 0), 0, (COMP_L/COMP_TOT)))%>%
  mutate(COM_M_RATIO = ifelse((COMP_M == 0), 0, (COMP_M/COMP_TOT)))%>%
  mutate(COM_N_RATIO = ifelse((COMP_N == 0), 0, (COMP_N/COMP_TOT)))%>%
  mutate(COM_O_RATIO = ifelse((COMP_O == 0), 0, (COMP_O/COMP_TOT)))%>%
  mutate(COM_P_RATIO = ifelse((COMP_P == 0), 0, (COMP_P/COMP_TOT)))%>%
  mutate(COM_Q_RATIO = ifelse((COMP_Q == 0), 0, (COMP_Q/COMP_TOT)))%>%
  mutate(COM_R_RATIO = ifelse((COMP_R == 0), 0, (COMP_R/COMP_TOT)))%>%
  mutate(COM_S_RATIO = ifelse((COMP_S == 0), 0, (COMP_S/COMP_TOT)))%>%
  mutate(COM_U_RATIO = ifelse((COMP_U == 0), 0, (COMP_U/COMP_TOT)))

3.1.5.3 Creating Population Density Indicator

Brazil_cities_Derived <-  Brazil_cities_Derived %>%
   mutate(POP_DENSITY = POP_GDP/AREA)
summary(Brazil_cities_Derived)
##                   CITY_STATE       CITY              STATE          
##  Abadia De Goiás_GO    :   1   Length:5564        Length:5564       
##  Abadia Dos Dourados_MG:   1   Class :character   Class :character  
##  Abadiânia_GO          :   1   Mode  :character   Mode  :character  
##  Abaeté_MG             :   1                                        
##  Abaetetuba_PA         :   1                                        
##  Abaiara_CE            :   1                                        
##  (Other)               :5558                                        
##     CAPITAL          IBGE_RES_POP      IBGE_RES_POP_BRAS  IBGE_RES_POP_ESTR  
##  Min.   :0.000000   Min.   :     805   Min.   :     805   Min.   :     0.00  
##  1st Qu.:0.000000   1st Qu.:    5234   1st Qu.:    5228   1st Qu.:     0.00  
##  Median :0.000000   Median :   10935   Median :   10930   Median :     0.00  
##  Mean   :0.004853   Mean   :   34282   Mean   :   34205   Mean   :    77.52  
##  3rd Qu.:0.000000   3rd Qu.:   23446   3rd Qu.:   23392   3rd Qu.:    10.00  
##  Max.   :1.000000   Max.   :11253503   Max.   :11133776   Max.   :119727.00  
##                                                                              
##     IBGE_DU        IBGE_DU_URBAN     IBGE_DU_RURAL        IBGE_POP       
##  Min.   :    239   Min.   :     60   Min.   :    0.0   Min.   :     174  
##  1st Qu.:   1572   1st Qu.:    874   1st Qu.:  471.8   1st Qu.:    2802  
##  Median :   3174   Median :   1845   Median :  918.5   Median :    6174  
##  Mean   :  10301   Mean   :   8857   Mean   : 1443.8   Mean   :   27599  
##  3rd Qu.:   6726   3rd Qu.:   4622   3rd Qu.: 1813.0   3rd Qu.:   15303  
##  Max.   :3576148   Max.   :3548433   Max.   :33809.0   Max.   :10463636  
##                                                                          
##      IBGE_1            IBGE_1-4           IBGE_5-9        IBGE_10-14      
##  Min.   :     0.0   Min.   :     5.0   Min.   :     7   Min.   :    12.0  
##  1st Qu.:    38.0   1st Qu.:   158.0   1st Qu.:   220   1st Qu.:   259.8  
##  Median :    92.0   Median :   376.5   Median :   516   Median :   588.5  
##  Mean   :   383.3   Mean   :  1544.8   Mean   :  2070   Mean   :  2381.8  
##  3rd Qu.:   232.0   3rd Qu.:   951.2   3rd Qu.:  1300   3rd Qu.:  1478.2  
##  Max.   :129464.0   Max.   :514794.0   Max.   :684443   Max.   :783702.0  
##                                                                           
##    IBGE_15-59         IBGE_60+         IDHM Ranking 2010      IDHM       
##  Min.   :     94   Min.   :     29.0   Min.   :   1      Min.   :0.4180  
##  1st Qu.:   1735   1st Qu.:    341.0   1st Qu.:1392      1st Qu.:0.5990  
##  Median :   3842   Median :    722.5   Median :2782      Median :0.6650  
##  Mean   :  18215   Mean   :   3004.7   Mean   :2783      Mean   :0.6592  
##  3rd Qu.:   9629   3rd Qu.:   1724.2   3rd Qu.:4173      3rd Qu.:0.7180  
##  Max.   :7058221   Max.   :1293012.0   Max.   :5565      Max.   :0.8620  
##                                                                          
##    IDHM_Renda     IDHM_Longevidade IDHM_Educacao         LONG       
##  Min.   :0.4000   Min.   :0.6720   Min.   :0.2070   Min.   :-72.92  
##  1st Qu.:0.5720   1st Qu.:0.7690   1st Qu.:0.4900   1st Qu.:-50.87  
##  Median :0.6540   Median :0.8080   Median :0.5600   Median :-46.52  
##  Mean   :0.6429   Mean   :0.8016   Mean   :0.5591   Mean   :-46.23  
##  3rd Qu.:0.7070   3rd Qu.:0.8360   3rd Qu.:0.6310   3rd Qu.:-41.41  
##  Max.   :0.8910   Max.   :0.8940   Max.   :0.8250   Max.   :-32.44  
##                                                                     
##       LAT               ALT                AREA            GVA_AGROPEC     
##  Min.   :-33.688   Min.   :     0.0   Min.   :     3.57   Min.   :      0  
##  1st Qu.:-22.839   1st Qu.:   169.8   1st Qu.:   204.53   1st Qu.:   4192  
##  Median :-18.091   Median :   406.5   Median :   416.59   Median :  20432  
##  Mean   :-16.447   Mean   :   894.0   Mean   :  1525.29   Mean   :  47270  
##  3rd Qu.: -8.489   3rd Qu.:   629.1   3rd Qu.:  1026.44   3rd Qu.:  51239  
##  Max.   :  4.585   Max.   :874579.0   Max.   :159533.33   Max.   :1402282  
##                                                                            
##   GVA_INDUSTRY       GVA_SERVICES         GVA_PUBLIC         GVA_TOTAL        
##  Min.   :       1   Min.   :        2   Min.   :       7   Min.   :      128  
##  1st Qu.:    1725   1st Qu.:    10113   1st Qu.:   17258   1st Qu.:    57490  
##  Median :    7428   Median :    31214   Median :   35837   Median :   124586  
##  Mean   :  176080   Mean   :   489940   Mean   :  123860   Mean   :   837149  
##  3rd Qu.:   41015   3rd Qu.:   115552   3rd Qu.:   89328   3rd Qu.:   352878  
##  Max.   :63306755   Max.   :464656988   Max.   :41902893   Max.   :569910503  
##                                                                               
##      TAXES                GDP               POP_GDP           GDP_CAPITA    
##  Min.   :   -14159   Min.   :       15   Min.   :     815   Min.   :  3191  
##  1st Qu.:     1302   1st Qu.:    43691   1st Qu.:    5486   1st Qu.:  9062  
##  Median :     5108   Median :   125153   Median :   11584   Median : 15870  
##  Mean   :   118983   Mean   :   955425   Mean   :   37028   Mean   : 21122  
##  3rd Qu.:    22219   3rd Qu.:   329733   3rd Qu.:   25105   3rd Qu.: 26155  
##  Max.   :117125387   Max.   :687035890   Max.   :12038175   Max.   :314638  
##                                                                             
##     COMP_TOT            COMP_A            COMP_B            COMP_C        
##  Min.   :     6.0   Min.   :   0.00   Min.   :  0.000   Min.   :    0.00  
##  1st Qu.:    68.0   1st Qu.:   1.00   1st Qu.:  0.000   1st Qu.:    3.00  
##  Median :   162.0   Median :   2.00   Median :  0.000   Median :   11.00  
##  Mean   :   907.6   Mean   :  18.27   Mean   :  1.853   Mean   :   73.51  
##  3rd Qu.:   449.2   3rd Qu.:   8.00   3rd Qu.:  2.000   3rd Qu.:   39.00  
##  Max.   :530446.0   Max.   :1948.00   Max.   :274.000   Max.   :31566.00  
##                                                                           
##      COMP_D             COMP_E            COMP_F             COMP_G        
##  Min.   :  0.0000   Min.   :  0.000   Min.   :    0.00   Min.   :     1.0  
##  1st Qu.:  0.0000   1st Qu.:  0.000   1st Qu.:    1.00   1st Qu.:    32.0  
##  Median :  0.0000   Median :  0.000   Median :    4.00   Median :    75.0  
##  Mean   :  0.4265   Mean   :  2.031   Mean   :   43.29   Mean   :   348.3  
##  3rd Qu.:  0.0000   3rd Qu.:  1.000   3rd Qu.:   15.00   3rd Qu.:   200.0  
##  Max.   :332.0000   Max.   :657.000   Max.   :25222.00   Max.   :150633.0  
##                                                                            
##      COMP_H             COMP_I             COMP_J             COMP_K        
##  Min.   :    0.00   Min.   :    0.00   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    1.00   1st Qu.:    2.00   1st Qu.:    0.00   1st Qu.:    0.00  
##  Median :    7.00   Median :    7.00   Median :    1.00   Median :    0.00  
##  Mean   :   41.03   Mean   :   55.93   Mean   :   24.77   Mean   :   15.57  
##  3rd Qu.:   25.00   3rd Qu.:   24.00   3rd Qu.:    5.00   3rd Qu.:    2.00  
##  Max.   :19515.00   Max.   :29290.00   Max.   :38720.00   Max.   :23738.00  
##                                                                             
##      COMP_L             COMP_M             COMP_N             COMP_O       
##  Min.   :    0.00   Min.   :    0.00   Min.   :    0.00   Min.   :  1.000  
##  1st Qu.:    0.00   1st Qu.:    1.00   1st Qu.:    1.00   1st Qu.:  2.000  
##  Median :    0.00   Median :    4.00   Median :    4.00   Median :  2.000  
##  Mean   :   15.15   Mean   :   51.34   Mean   :   83.78   Mean   :  3.271  
##  3rd Qu.:    3.00   3rd Qu.:   13.00   3rd Qu.:   14.00   3rd Qu.:  3.000  
##  Max.   :14003.00   Max.   :49181.00   Max.   :76757.00   Max.   :204.000  
##                                                                            
##      COMP_P             COMP_Q             COMP_R            COMP_S        
##  Min.   :    0.00   Min.   :    0.00   Min.   :   0.00   Min.   :    0.00  
##  1st Qu.:    2.00   1st Qu.:    1.00   1st Qu.:   0.00   1st Qu.:    5.00  
##  Median :    6.00   Median :    3.00   Median :   2.00   Median :   12.00  
##  Mean   :   30.98   Mean   :   34.18   Mean   :  12.19   Mean   :   51.66  
##  3rd Qu.:   17.00   3rd Qu.:   12.00   3rd Qu.:   6.00   3rd Qu.:   31.00  
##  Max.   :16030.00   Max.   :22248.00   Max.   :6687.00   Max.   :24832.00  
##                                                                            
##      COMP_U          RURAL_URBANIntermediário Adjacente
##  Min.   :  0.00000   Min.   :0.0000                    
##  1st Qu.:  0.00000   1st Qu.:0.0000                    
##  Median :  0.00000   Median :0.0000                    
##  Mean   :  0.05032   Mean   :0.1233                    
##  3rd Qu.:  0.00000   3rd Qu.:0.0000                    
##  Max.   :123.00000   Max.   :1.0000                    
##                                                        
##  RURAL_URBANIntermediário Remoto RURAL_URBANRural Adjacente
##  Min.   :0.00000                 Min.   :0.0000            
##  1st Qu.:0.00000                 1st Qu.:0.0000            
##  Median :0.00000                 Median :1.0000            
##  Mean   :0.01078                 Mean   :0.5462            
##  3rd Qu.:0.00000                 3rd Qu.:1.0000            
##  Max.   :1.00000                 Max.   :1.0000            
##                                                            
##  RURAL_URBANRural Remoto RURAL_URBANUrbano
##  Min.   :0.00000         Min.   :0.0000   
##  1st Qu.:0.00000         1st Qu.:0.0000   
##  Median :0.00000         Median :0.0000   
##  Mean   :0.05805         Mean   :0.2617   
##  3rd Qu.:0.00000         3rd Qu.:1.0000   
##  Max.   :1.00000         Max.   :1.0000   
##                                           
##  GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social
##  Min.   :0.0000                                                              
##  1st Qu.:0.0000                                                              
##  Median :0.0000                                                              
##  Mean   :0.4892                                                              
##  3rd Qu.:1.0000                                                              
##  Max.   :1.0000                                                              
##                                                                              
##  GVA_MAINAgricultura, inclusive apoio à agricultura e a pós colheita
##  Min.   :0.0000                                                     
##  1st Qu.:0.0000                                                     
##  Median :0.0000                                                     
##  Mean   :0.1317                                                     
##  3rd Qu.:0.0000                                                     
##  Max.   :1.0000                                                     
##                                                                     
##  GVA_MAINComércio e reparação de veículos automotores e motocicletas
##  Min.   :0.000000                                                   
##  1st Qu.:0.000000                                                   
##  Median :0.000000                                                   
##  Mean   :0.008267                                                   
##  3rd Qu.:0.000000                                                   
##  Max.   :1.000000                                                   
##                                                                     
##  GVA_MAINConstrução GVA_MAINDemais serviços
##  Min.   :0.000000   Min.   :0.0000         
##  1st Qu.:0.000000   1st Qu.:0.0000         
##  Median :0.000000   Median :0.0000         
##  Mean   :0.001258   Mean   :0.2653         
##  3rd Qu.:0.000000   3rd Qu.:1.0000         
##  Max.   :1.000000   Max.   :1.0000         
##                                            
##  GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação
##  Min.   :0.00000                                                                             
##  1st Qu.:0.00000                                                                             
##  Median :0.00000                                                                             
##  Mean   :0.01761                                                                             
##  3rd Qu.:0.00000                                                                             
##  Max.   :1.00000                                                                             
##                                                                                              
##  GVA_MAINIndústrias de transformação GVA_MAINIndústrias extrativas
##  Min.   :0.00000                     Min.   :0.00000              
##  1st Qu.:0.00000                     1st Qu.:0.00000              
##  Median :0.00000                     Median :0.00000              
##  Mean   :0.04691                     Mean   :0.00629              
##  3rd Qu.:0.00000                     3rd Qu.:0.00000              
##  Max.   :1.00000                     Max.   :1.00000              
##                                                                   
##  GVA_MAINPecuária, inclusive apoio à pecuária
##  Min.   :0.00000                             
##  1st Qu.:0.00000                             
##  Median :0.00000                             
##  Mean   :0.02894                             
##  3rd Qu.:0.00000                             
##  Max.   :1.00000                             
##                                              
##  GVA_MAINProdução florestal, pesca e aquicultura RES_BRAZ_POP_RATIO
##  Min.   :0.000000                                Min.   :0.6228    
##  1st Qu.:0.000000                                1st Qu.:0.9993    
##  Median :0.000000                                Median :1.0000    
##  Mean   :0.004493                                Mean   :0.9992    
##  3rd Qu.:0.000000                                3rd Qu.:1.0000    
##  Max.   :1.000000                                Max.   :1.0000    
##                                                                    
##  RES_FOREIGN_POP_RATIO DOM_URBAN_RATIO   DOM_RURAL_RATIO  POP_BEL_ONE_RATIO
##  Min.   :0.0000000     Min.   :0.04553   Min.   :0.0000   Min.   :0.00000  
##  1st Qu.:0.0000000     1st Qu.:0.49148   1st Qu.:0.1696   1st Qu.:0.01209  
##  Median :0.0000000     Median :0.66263   Median :0.3374   Median :0.01418  
##  Mean   :0.0007593     Mean   :0.65205   Mean   :0.3479   Mean   :0.01445  
##  3rd Qu.:0.0006992     3rd Qu.:0.83040   3rd Qu.:0.5085   3rd Qu.:0.01651  
##  Max.   :0.3772182     Max.   :1.00000   Max.   :0.9545   Max.   :0.03314  
##                                                                            
##  POP_ONE_to_FOUR_RATIO POP_FIVE_to_NINE_RATIO POP_TEN_to_FOURTEEN_RATIO
##  Min.   :0.01008       Min.   :0.02482        Min.   :0.03491          
##  1st Qu.:0.05018       1st Qu.:0.06942        1st Qu.:0.08227          
##  Median :0.05826       Median :0.08012        Median :0.09207          
##  Mean   :0.05951       Mean   :0.08169        Mean   :0.09344          
##  3rd Qu.:0.06717       3rd Qu.:0.09180        3rd Qu.:0.10346          
##  Max.   :0.11881       Max.   :0.16247        Max.   :0.16649          
##                                                                        
##  POP_WORKING_RATIO POP_ELDERLY_RATIO GVA_AGROPEC_RATIO GVA_INDUSTRY_RATIO 
##  Min.   :0.4716    Min.   :0.02255   Min.   :0.00000   Min.   :0.0000157  
##  1st Qu.:0.6087    1st Qu.:0.09799   1st Qu.:0.03364   1st Qu.:0.0368730  
##  Median :0.6325    Median :0.11921   Median :0.15062   Median :0.0714602  
##  Mean   :0.6308    Mean   :0.12009   Mean   :0.21034   Mean   :0.1377745  
##  3rd Qu.:0.6543    3rd Qu.:0.14103   3rd Qu.:0.34094   3rd Qu.:0.1795132  
##  Max.   :0.7448    Max.   :0.42199   Max.   :0.99877   Max.   :0.9991868  
##                                                                           
##  GVA_SERVICES_RATIO  GVA_PUBLIC_RATIO     COM_A_RATIO        COM_B_RATIO      
##  Min.   :0.0000461   Min.   :0.0000433   Min.   :0.000000   Min.   :0.000000  
##  1st Qu.:0.1985910   1st Qu.:0.1448472   1st Qu.:0.001569   1st Qu.:0.000000  
##  Median :0.3117002   Median :0.2948082   Median :0.011803   Median :0.000000  
##  Mean   :0.3260963   Mean   :0.3257928   Mean   :0.039408   Mean   :0.006019  
##  3rd Qu.:0.4600063   3rd Qu.:0.4966551   3rd Qu.:0.031915   3rd Qu.:0.005188  
##  Max.   :0.9995977   Max.   :0.9996029   Max.   :0.917085   Max.   :0.333333  
##                                                                               
##   COM_C_RATIO       COM_D_RATIO         COM_E_RATIO        COM_F_RATIO     
##  Min.   :0.00000   Min.   :0.0000000   Min.   :0.000000   Min.   :0.00000  
##  1st Qu.:0.03636   1st Qu.:0.0000000   1st Qu.:0.000000   1st Qu.:0.01389  
##  Median :0.06590   Median :0.0000000   Median :0.000000   Median :0.02778  
##  Mean   :0.07967   Mean   :0.0007847   Mean   :0.002508   Mean   :0.03130  
##  3rd Qu.:0.10593   3rd Qu.:0.0000000   3rd Qu.:0.003226   3rd Qu.:0.04348  
##  Max.   :0.54518   Max.   :0.4444444   Max.   :0.083333   Max.   :0.29213  
##                                                                            
##   COM_G_RATIO       COM_H_RATIO       COM_I_RATIO       COM_J_RATIO      
##  Min.   :0.01789   Min.   :0.00000   Min.   :0.00000   Min.   :0.000000  
##  1st Qu.:0.38980   1st Qu.:0.01562   1st Qu.:0.02128   1st Qu.:0.000000  
##  Median :0.46396   Median :0.03757   Median :0.04167   Median :0.007299  
##  Mean   :0.47234   Mean   :0.04955   Mean   :0.04567   Mean   :0.009054  
##  3rd Qu.:0.55263   3rd Qu.:0.07052   3rd Qu.:0.06202   3rd Qu.:0.013982  
##  Max.   :0.89091   Max.   :0.43689   Max.   :0.52542   Max.   :0.417249  
##                                                                          
##   COM_K_RATIO        COM_L_RATIO        COM_M_RATIO       COM_N_RATIO     
##  Min.   :0.000000   Min.   :0.000000   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.01144   1st Qu.:0.01802  
##  Median :0.000000   Median :0.000000   Median :0.02362   Median :0.02924  
##  Mean   :0.003933   Mean   :0.005450   Mean   :0.02536   Mean   :0.03553  
##  3rd Qu.:0.006112   3rd Qu.:0.008601   3rd Qu.:0.03659   3rd Qu.:0.04496  
##  Max.   :0.087912   Max.   :0.156863   Max.   :0.24444   Max.   :0.33527  
##                                                                           
##   COM_O_RATIO         COM_P_RATIO       COM_Q_RATIO        COM_R_RATIO      
##  Min.   :0.0001764   Min.   :0.00000   Min.   :0.000000   Min.   :0.000000  
##  1st Qu.:0.0058954   1st Qu.:0.01786   1st Qu.:0.006615   1st Qu.:0.000000  
##  Median :0.0153846   Median :0.02985   Median :0.019946   Median :0.009091  
##  Mean   :0.0277867   Mean   :0.04350   Mean   :0.022028   Mean   :0.010772  
##  3rd Qu.:0.0361664   3rd Qu.:0.04878   3rd Qu.:0.033033   3rd Qu.:0.015310  
##  Max.   :0.3636364   Max.   :0.83673   Max.   :0.214286   Max.   :0.166667  
##                                                                             
##   COM_S_RATIO       COM_U_RATIO         POP_DENSITY       
##  Min.   :0.00000   Min.   :0.000e+00   Min.   :    0.084  
##  1st Qu.:0.04116   1st Qu.:0.000e+00   1st Qu.:   11.939  
##  Median :0.06395   Median :0.000e+00   Median :   25.306  
##  Mean   :0.08933   Mean   :2.036e-06   Mean   :  117.271  
##  3rd Qu.:0.11147   3rd Qu.:0.000e+00   3rd Qu.:   55.443  
##  Max.   :0.56716   Max.   :2.985e-03   Max.   :13533.497  
## 

Data Looks good, though we should pay attention to the ratios which have a max-value less than 1. It would be prudent not to normalize them.

3.1.6 Plotting Derived Indicators

3.1.6.1 Plotting Foregin vs Local Residents

Brazil_cities_Derived[73:74]%>%
  gather() %>% 
  ggplot(aes(value)) +
    facet_wrap(~ key, scales = "free") +
    geom_histogram()

3.1.6.2 Plotting Domestic Rural versus Urban

Brazil_cities_Derived[75:76] %>%
  gather() %>% 
  ggplot(aes(value)) +
    facet_wrap(~ key, scales = "free") +
    geom_histogram()

3.1.6.3 Plotting Age Ratios

Brazil_cities_Derived[77:82] %>%
  gather() %>% 
  ggplot(aes(value)) +
    facet_wrap(~ key, scales = "free") +
    geom_histogram()

3.1.6.4 GVA Ratios

Brazil_cities_Derived[83:86] %>%
  gather() %>% 
  ggplot(aes(value)) +
    facet_wrap(~ key, scales = "free") +
    geom_histogram()

3.1.6.5 Company Type Ratios

Brazil_cities_Derived[87:106] %>%
  gather() %>% 
  ggplot(aes(value)) +
    facet_wrap(~ key, scales = "free") +
    geom_histogram()

3.2 Geospatial Data Wrangling

3.2.1 Converting Aspatial Data into Geospatial Point Dataframe

Brazil_cities.sf <- st_as_sf(Brazil_cities_Derived,
                            coords = c("LONG", "LAT"),
                            crs=4326) %>%
  st_transform(crs=4674)
head(Brazil_cities.sf)
## Simple feature collection with 6 features and 104 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -49.44055 ymin: -19.15585 xmax: -39.04755 ymax: -1.72347
## geographic CRS: SIRGAS 2000
##               CITY_STATE                CITY STATE CAPITAL IBGE_RES_POP
## 1     Abadia De Goiás_GO     Abadia De Goiás    GO       0         6876
## 2 Abadia Dos Dourados_MG Abadia Dos Dourados    MG       0         6704
## 3           Abadiânia_GO           Abadiânia    GO       0        15757
## 4              Abaeté_MG              Abaeté    MG       0        22690
## 5          Abaetetuba_PA          Abaetetuba    PA       0       141100
## 6             Abaiara_CE             Abaiara    CE       0        10496
##   IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL
## 1              6876                 0    2137          1546           591
## 2              6704                 0    2328          1481           847
## 3             15609               148    4655          3233          1422
## 4             22690                 0    7694          6667          1027
## 5            141040                60   31061         19057         12004
## 6             10496                 0    2791          1251          1540
##   IBGE_POP IBGE_1 IBGE_1-4 IBGE_5-9 IBGE_10-14 IBGE_15-59 IBGE_60+
## 1     5300     69      318      438        517       3542      416
## 2     4154     38      207      260        351       2709      589
## 3    10656    139      650      894       1087       6896      990
## 4    18464    176      856     1233       1539      11979     2681
## 5    82956   1354     5567     7618       8905      53516     5996
## 6     4538     98      323      421        483       2631      582
##   IDHM Ranking 2010  IDHM IDHM_Renda IDHM_Longevidade IDHM_Educacao     ALT
## 1              1689 0.708      0.687            0.830         0.622  893.60
## 2              2207 0.690      0.693            0.839         0.563  753.12
## 3              2202 0.690      0.671            0.841         0.579 1017.55
## 4              1994 0.698      0.720            0.848         0.556  644.74
## 5              3530 0.628      0.579            0.798         0.537   10.12
## 6              3522 0.628      0.540            0.748         0.612  403.11
##      AREA GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES GVA_PUBLIC  GVA_TOTAL     TAXES
## 1  147.26        6.20     27991.25     74750.32   36915.04   139662.81 20554.20
## 2  881.06    50524.57     25917.70     62689.23   28083.79   167215.29 12873.50
## 3 1045.13       42.84     16728.30    138198.58   63396.20   218365.92 26822.58
## 4 1817.07   113824.60     31002.62       172.33   86081.41   231080.96 26994.09
## 5 1610.65   140463.72     58610.00    468128.69  486872.40  1154074.81 95180.48
## 6  180.08     4435.16         5.88        22.81   35989.96    40453.81  4042.79
##          GDP POP_GDP GDP_CAPITA COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E
## 1     166.41    8053   20664.57      284      5      1     56      0      2
## 2     180.09    7037   25591.70      476      6      6     30      1      2
## 3  287984.49   18427   15628.40      288      5      9     26      0      2
## 4  430235.36   23574   18250.42      621     18      1     40      0      1
## 5 1249255.29  151934    8222.36      931      4      2     43      0      1
## 6   73151.46   11483    6370.41       86      1      0      4      0      0
##   COMP_F COMP_G COMP_H COMP_I COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P
## 1     29    110     26      4      5      0      2     10     12      4      6
## 2     34    190     70     28     11      0      4     15     29      2      9
## 3      7    117     12     57      2      1      0      7     15      3     11
## 4     20    303     62     30      9      6      4     28     27      2     15
## 5     27    500     16     31      6      1      1     22     16      2    155
## 6      6     48      2     10      2      0      0      2      3      2      0
##   COMP_Q COMP_R COMP_S COMP_U RURAL_URBANIntermediário Adjacente
## 1      6      1      5      0                                  0
## 2     14      6     19      0                                  0
## 3      5      1      8      0                                  0
## 4     19      9     27      0                                  0
## 5     33     15     56      0                                  0
## 6      2      0      4      0                                  0
##   RURAL_URBANIntermediário Remoto RURAL_URBANRural Adjacente
## 1                               0                          0
## 2                               0                          1
## 3                               0                          1
## 4                               0                          0
## 5                               0                          0
## 6                               0                          1
##   RURAL_URBANRural Remoto RURAL_URBANUrbano
## 1                       0                 1
## 2                       0                 0
## 3                       0                 0
## 4                       0                 1
## 5                       0                 1
## 6                       0                 0
##   GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social
## 1                                                                            0
## 2                                                                            0
## 3                                                                            0
## 4                                                                            0
## 5                                                                            1
## 6                                                                            1
##   GVA_MAINAgricultura, inclusive apoio à agricultura e a pós colheita
## 1                                                                   0
## 2                                                                   0
## 3                                                                   0
## 4                                                                   0
## 5                                                                   0
## 6                                                                   0
##   GVA_MAINComércio e reparação de veículos automotores e motocicletas
## 1                                                                   0
## 2                                                                   0
## 3                                                                   0
## 4                                                                   0
## 5                                                                   0
## 6                                                                   0
##   GVA_MAINConstrução GVA_MAINDemais serviços
## 1                  0                       1
## 2                  0                       1
## 3                  0                       1
## 4                  0                       1
## 5                  0                       0
## 6                  0                       0
##   GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação
## 1                                                                                            0
## 2                                                                                            0
## 3                                                                                            0
## 4                                                                                            0
## 5                                                                                            0
## 6                                                                                            0
##   GVA_MAINIndústrias de transformação GVA_MAINIndústrias extrativas
## 1                                   0                             0
## 2                                   0                             0
## 3                                   0                             0
## 4                                   0                             0
## 5                                   0                             0
## 6                                   0                             0
##   GVA_MAINPecuária, inclusive apoio à pecuária
## 1                                            0
## 2                                            0
## 3                                            0
## 4                                            0
## 5                                            0
## 6                                            0
##   GVA_MAINProdução florestal, pesca e aquicultura RES_BRAZ_POP_RATIO
## 1                                               0          1.0000000
## 2                                               0          1.0000000
## 3                                               0          0.9906073
## 4                                               0          1.0000000
## 5                                               0          0.9995748
## 6                                               0          1.0000000
##   RES_FOREIGN_POP_RATIO DOM_URBAN_RATIO DOM_RURAL_RATIO POP_BEL_ONE_RATIO
## 1          0.0000000000       0.7234441       0.2765559       0.013018868
## 2          0.0000000000       0.6361684       0.3638316       0.009147809
## 3          0.0093926509       0.6945220       0.3054780       0.013044294
## 4          0.0000000000       0.8665194       0.1334806       0.009532062
## 5          0.0004252303       0.6135347       0.3864653       0.016321906
## 6          0.0000000000       0.4482264       0.5517736       0.021595416
##   POP_ONE_to_FOUR_RATIO POP_FIVE_to_NINE_RATIO POP_TEN_to_FOURTEEN_RATIO
## 1            0.06000000             0.08264151                0.09754717
## 2            0.04983149             0.06259027                0.08449687
## 3            0.06099850             0.08389640                0.10200826
## 4            0.04636049             0.06677860                0.08335139
## 5            0.06710786             0.09183181                0.10734606
## 6            0.07117673             0.09277215                0.10643455
##   POP_WORKING_RATIO POP_ELDERLY_RATIO GVA_AGROPEC_RATIO GVA_INDUSTRY_RATIO
## 1         0.6683019        0.07849057      4.439263e-05        0.200420212
## 2         0.6521425        0.14179104      3.021528e-01        0.154995993
## 3         0.6471471        0.09290541      1.961845e-04        0.076606734
## 4         0.6487760        0.14520147      4.925746e-01        0.134163455
## 5         0.6451131        0.07227928      1.217111e-01        0.050785269
## 6         0.5797708        0.12825033      1.096352e-01        0.000145351
##   GVA_SERVICES_RATIO GVA_PUBLIC_RATIO COM_A_RATIO COM_B_RATIO COM_C_RATIO
## 1       0.5352199344        0.2643155 0.017605634 0.003521127  0.19718310
## 2       0.3749013024        0.1679499 0.012605042 0.012605042  0.06302521
## 3       0.6328761374        0.2903209 0.017361111 0.031250000  0.09027778
## 4       0.0007457559        0.3725162 0.028985507 0.001610306  0.06441224
## 5       0.4056311479        0.4218725 0.004296455 0.002148228  0.04618690
## 6       0.0005638529        0.8896556 0.011627907 0.000000000  0.04651163
##   COM_D_RATIO COM_E_RATIO COM_F_RATIO COM_G_RATIO COM_H_RATIO COM_I_RATIO
## 1  0.00000000 0.007042254  0.10211268   0.3873239  0.09154930  0.01408451
## 2  0.00210084 0.004201681  0.07142857   0.3991597  0.14705882  0.05882353
## 3  0.00000000 0.006944444  0.02430556   0.4062500  0.04166667  0.19791667
## 4  0.00000000 0.001610306  0.03220612   0.4879227  0.09983897  0.04830918
## 5  0.00000000 0.001074114  0.02900107   0.5370569  0.01718582  0.03329753
## 6  0.00000000 0.000000000  0.06976744   0.5581395  0.02325581  0.11627907
##   COM_J_RATIO COM_K_RATIO COM_L_RATIO COM_M_RATIO COM_N_RATIO COM_O_RATIO
## 1 0.017605634 0.000000000 0.007042254  0.03521127  0.04225352 0.014084507
## 2 0.023109244 0.000000000 0.008403361  0.03151261  0.06092437 0.004201681
## 3 0.006944444 0.003472222 0.000000000  0.02430556  0.05208333 0.010416667
## 4 0.014492754 0.009661836 0.006441224  0.04508857  0.04347826 0.003220612
## 5 0.006444683 0.001074114 0.001074114  0.02363050  0.01718582 0.002148228
## 6 0.023255814 0.000000000 0.000000000  0.02325581  0.03488372 0.023255814
##   COM_P_RATIO COM_Q_RATIO COM_R_RATIO COM_S_RATIO COM_U_RATIO POP_DENSITY
## 1  0.02112676  0.02112676 0.003521127  0.01760563           0    54.68559
## 2  0.01890756  0.02941176 0.012605042  0.03991597           0     7.98697
## 3  0.03819444  0.01736111 0.003472222  0.02777778           0    17.63130
## 4  0.02415459  0.03059581 0.014492754  0.04347826           0    12.97363
## 5  0.16648765  0.03544576 0.016111708  0.06015038           0    94.33086
## 6  0.00000000  0.02325581 0.000000000  0.04651163           0    63.76610
##                      geometry
## 1 POINT (-49.44055 -16.75881)
## 2 POINT (-47.39683 -18.48756)
## 3 POINT (-48.71881 -16.18267)
## 4 POINT (-45.44619 -19.15585)
## 5   POINT (-48.8844 -1.72347)
## 6 POINT (-39.04755 -7.356977)

We will be changing the CRS to 4674 as per the geobr documentation in order to accurately map the datapoints to the Brazil country map for the municipalities.

3.2.1.1 Validity checking data

Validity_NA_Check(Brazil_cities.sf)
## [1] "For: Brazil_cities.sf"
## [1] "Number of Invalid polygons/points is: 0"
## [1] "Number of NA rows is: 0"

3.2.2 Importing Municipal Geospatial Data

#muni.sf <- read_municipality(year=2010)

We will be loading in the municipalities from 2010 in order to ensure that our data to align with the lat long data from our aspatial dataset which specifies the date as 2010. Additionally this will be commented out as we will save the data locally after cleaning to reduce processing time of the file.

3.2.3 Inspecting Geospatial Data

#Validity_NA_Check(muni.sf)
#muni.sf <- st_make_valid(muni.sf)
#Validity_NA_Check(muni.sf)
#muni.sp <- as_Spatial(muni.sf)
#writeOGR(muni.sp, "./data/geospatial", "Brazil_Muni", driver="ESRI Shapefile")

The ablove were commented out to reduce loading times. We will load in the file locally and check the validity.

tmap_mode("plot")
muni_loaded.sf <- st_read(dsn="data/geospatial", layer="Brazil_Muni")
## Reading layer `Brazil_Muni' from data source `D:\GSA\Take_Home_EX04\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 5567 features and 4 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -73.99045 ymin: -33.75208 xmax: -28.83609 ymax: 5.271841
## geographic CRS: GRS 1980(IUGG, 1980)
st_crs(muni_loaded.sf) <- 4674
qtm(muni_loaded.sf)

Validity_NA_Check(muni_loaded.sf)
## [1] "For: muni_loaded.sf"
## [1] "Number of Invalid polygons/points is: 0"
## [1] "Number of NA rows is: 0"

3.2.3.1 Creating unique identifier

muni_loaded_w_unique.sf <- cbind(CITY_STATE_M = paste(muni_loaded.sf$name_mn, muni_loaded.sf$abbrv_s, sep="_"), muni_loaded.sf)

3.2.4 Mapping points on map

tm_shape(muni_loaded_w_unique.sf)+
  tm_fill(col= "code_mn")+
  tm_shape(Brazil_cities.sf)+
  tm_dots(size = 0.01)

Based on the map above, we can observe the points are accurately mapped to the respective municipalities in Brazil We will create a combined dataframe to allow us to perform our next phase of choropleth mapping.

#Brazil_cities.sf <- Brazil_cities.sf[!(Brazil_cities.sf$CITY_STATE =="Fernando De Noronha_PE"), ]
tmap_mode("plot")

3.2.5 Building SuperFrame

Brazil_super.sf <- st_join(muni_loaded_w_unique.sf, Brazil_cities.sf, join=st_intersects)

3.2.5.1 Validating and cleaning superframe

Validity_NA_Check(Brazil_super.sf)
## [1] "For: Brazil_super.sf"
## [1] "Number of Invalid polygons/points is: 0"
## [1] "Number of NA rows is: 3"

Checking NA Row locations

temp_NA <- Brazil_super.sf[rowSums(is.na(Brazil_super.sf))!=0,]
as.character(temp_NA$name_mn)
## [1] "Santa Teresinha" "Lagoa Mirim"     "Lagoa Dos Patos"

Based on the data above, we can see that 2 of the polygons with NA are lakes and the last one is Santa Teresinha which we removed because of missing values in the data cleaning. This means that the rest of the polygons should have the data mapped to them correctly, unless there are double points in them.

Removing NA rows

Brazil_super_cleaned.sf<- Brazil_super.sf[rowSums(is.na(Brazil_super.sf))==0,]
Validity_NA_Check(Brazil_super_cleaned.sf)
## [1] "For: Brazil_super_cleaned.sf"
## [1] "Number of Invalid polygons/points is: 0"
## [1] "Number of NA rows is: 0"

Checking for duplicates

dim(Brazil_super_cleaned.sf[duplicated(Brazil_super_cleaned.sf$CITY_STATE.x),])
## [1]   0 110

Seems there are no duplicate rows. Which means that each polygon has only one data point attached to it.

4 Choropleth Map plotting

tmap_mode("plot")
tm_shape(Brazil_super_cleaned.sf)+
  tm_fill(col= "GDP_CAPITA",
          style="jenks",
          title = "GDP per Capita",
          palette ="Greens")+
  tm_layout(main.title = "Distribution of GDP per Capita by Municipality \n(Jenks classification)",
            main.title.position = "center",
            main.title.size = 1,
            legend.height = 0.45, 
            legend.width = 0.35,
            legend.outside = FALSE,
            legend.position = c("right", "bottom"),
            frame = FALSE) +
  tm_borders(alpha = 0.1)

Based on the map above. We can see a surprising result in our mapping for GDP per Capita. It appears that the highest GDP per capita are around the satelight cities around Sao Paulo rather than the main city itself. Additionally, very far inland in areas like Selviria and Campos De Júlio, we can also see concentrations of higher GDP per capita. This could be due to a lower population while the region is still generating a large amount of production. This is surpising given the larger areas of these polygons.

What is even more suprising is that the two main cities in Brazil of Rio De Janeiro and Sao Paolo only have GDP per capita of 50,690 and 57,071 respectively. This is most likely due to a much larger population count concentrated in these smaller areas which is concerning from a social development standpoint.

5 Multiple Linear Regression

5.1 Data Preperation for Regression

5.1.1 Removal of unnecessary columns

dropsAbrev <- c("CITY_STATE_M", "code_mn", "name_mn", "cod_stt", "abbrv_s", "CITY", "STATE")

Brazil_reg.sf <- Brazil_super_cleaned.sf[ , !(names(Brazil_super_cleaned.sf) %in% dropsAbrev)]

5.1.2 Seperating Categorical from Numerical

Brazil_numeric_vars <- cbind(Brazil_reg.sf[,3:28]%>%
  st_set_geometry(NULL), Brazil_reg.sf[,32:52]%>%
  st_set_geometry(NULL), Brazil_reg.sf[,102]%>%
  st_set_geometry(NULL))
  
Brazil_numeric_vars.norm <- normalize(Brazil_numeric_vars)

Brazil_Ratios_vars <- Brazil_reg.sf[,68:101] %>%
  st_set_geometry(NULL)

Brazil_Categorical_vars <- cbind(Brazil_reg.sf[,2]%>%
  st_set_geometry(NULL), Brazil_reg.sf[,53:67]%>%
  st_set_geometry(NULL))

5.1.2.1 Creating an “All variables” dataframe for correlational plot checking.

dropsReg <- c("CITY_STATE", "GDP", "GDP_CAPITA", "POP_GDP")

Brazil_All_vars <- Brazil_reg.sf[ , !(names(Brazil_reg.sf) %in% dropsReg)] %>%
  st_set_geometry(NULL)

5.1.3 Perform Correlational analysis of variables

corrplot(cor(Brazil_numeric_vars.norm, use = "complete.obs"), diag = FALSE, order = "AOE",
        tl.pos = "td", tl.cex = 0.5, method = "square", type = "upper")

corrplot(cor(Brazil_Ratios_vars, use = "complete.obs"), diag = FALSE, order = "AOE",
        tl.pos = "td", tl.cex = 0.5, method = "number", type = "upper")

corrplot(cor(Brazil_Categorical_vars, use = "complete.obs"), diag = FALSE, order = "AOE",
        tl.pos = "td", tl.cex = 0.5, method = "square", type = "upper")

# Removed all variables for display reasons. Although they were checked in the analysis to ensure all variables don't correlate too much
# corrplot(cor(Brazil_All_vars, use = "complete.obs"), diag = FALSE, order = "AOE",
        #tl.pos = "td", tl.cex = 0.5, method = "sqaure", type = "upper")

As expected, there are a number of indicators from our numeric dataset that are clearly highyl correlated with one another, noticaply the IBGE, GVA, TAXES and COMP numbers. Because of their correlation with COMP_TOT, we will use that as a metric to capture all those numbers as it is the likely contributor to those variables arizing (particularly taxes). We will also use IDHM as a measure for all the IDHM indicators specified although there will be some loss of information.

Within Ratios, we can see the amongst the population ratios the youths are very highly correlated. As these are ratios, we can sum them up to give us a new Youth metric instead. Additionally because DOM_RURAL_RATIO, DOM_URBAN_RATIO and RES_BRAZ_POP_RATIO, RES_FOREIGN_POP_RATIO are polar opposites, we can just take one to use as an indicator. In our case, we will choose the Foreign Population ratio and the Domestic Urban Units ratis.

5.1.4 Extracting and combining variables to be useful

5.1.4.1 Extracting useful numeric

Brazil_numeric_vars_pro <- Brazil_numeric_vars.norm %>% select("ALT", "AREA", "IDHM", "POP_DENSITY", "COMP_TOT")

5.1.4.2 Remodelling and extracting ratios

Brazil_Ratios_vars_pro <-  Brazil_Ratios_vars %>%
   mutate( POP_YOUTH_RATIO = as.numeric((POP_BEL_ONE_RATIO + POP_ONE_to_FOUR_RATIO + POP_FIVE_to_NINE_RATIO + POP_TEN_to_FOURTEEN_RATIO)))

dropsRatios <- c("POP_BEL_ONE_RATIO", "POP_ONE_to_FOUR_RATIO", "POP_FIVE_to_NINE_RATIO", "POP_TEN_to_FOURTEEN_RATIO", "RES_BRAZ_POP_RATIO", "DOM_RURAL_RATIO")

Brazil_Ratios_vars_pro <- Brazil_Ratios_vars_pro[ , !(names(Brazil_Ratios_vars_pro) %in% dropsRatios)] 

5.1.4.3 Combining Variables for mapping

Brazil_indicators <- cbind(Brazil_Ratios_vars_pro, Brazil_Categorical_vars, Brazil_numeric_vars_pro)

5.1.4.4 Performing Correlational matrix plot once more

corrplot(cor(Brazil_indicators, use = "complete.obs"), diag = FALSE, order = "AOE",
        tl.pos = "td", tl.cex = 0.4, number.cex= 0.3, method = "number", type = "upper")

Based on our correlational plot, we dont see any variables which are heavily correlated beyond 0.75. As such, we will take these variables to be those we utilize in our regression.

5.1.5 Forming Final Simple Feature Dataframe

polygon_frame <- Brazil_reg.sf %>% select("CITY_STATE")
joining_frame <- Brazil_reg.sf %>% select("CITY_STATE", "GDP_CAPITA") %>% st_set_geometry(NULL)
joining_frame_states <- cbind(joining_frame, Brazil_indicators)
Brazil_Indicators.sf <- left_join(polygon_frame, joining_frame_states, by="CITY_STATE") ## Usually you would use an index but after checking the data, we find that it does align with the data from Brazil_reg.sf so as such, we can assume the data was actually joint to the original SF

5.1.5.1 Validating and summurizing variables

Validity_NA_Check(Brazil_Indicators.sf)
## [1] "For: Brazil_Indicators.sf"
## [1] "Number of Invalid polygons/points is: 0"
## [1] "Number of NA rows is: 0"
summary(Brazil_Indicators.sf)
##                   CITY_STATE     GDP_CAPITA     RES_FOREIGN_POP_RATIO
##  Abadia De Goiás_GO    :   1   Min.   :  3191   Min.   :0.0000000    
##  Abadia Dos Dourados_MG:   1   1st Qu.:  9062   1st Qu.:0.0000000    
##  Abadiânia_GO          :   1   Median : 15870   Median :0.0000000    
##  Abaeté_MG             :   1   Mean   : 21122   Mean   :0.0007593    
##  Abaetetuba_PA         :   1   3rd Qu.: 26155   3rd Qu.:0.0006992    
##  Abaiara_CE            :   1   Max.   :314638   Max.   :0.3772182    
##  (Other)               :5558                                         
##  DOM_URBAN_RATIO   POP_WORKING_RATIO POP_ELDERLY_RATIO GVA_AGROPEC_RATIO
##  Min.   :0.04553   Min.   :0.4716    Min.   :0.02255   Min.   :0.00000  
##  1st Qu.:0.49148   1st Qu.:0.6087    1st Qu.:0.09799   1st Qu.:0.03364  
##  Median :0.66263   Median :0.6325    Median :0.11921   Median :0.15062  
##  Mean   :0.65205   Mean   :0.6308    Mean   :0.12009   Mean   :0.21034  
##  3rd Qu.:0.83040   3rd Qu.:0.6543    3rd Qu.:0.14103   3rd Qu.:0.34094  
##  Max.   :1.00000   Max.   :0.7448    Max.   :0.42199   Max.   :0.99877  
##                                                                         
##  GVA_INDUSTRY_RATIO  GVA_SERVICES_RATIO  GVA_PUBLIC_RATIO     COM_A_RATIO      
##  Min.   :0.0000157   Min.   :0.0000461   Min.   :0.0000433   Min.   :0.000000  
##  1st Qu.:0.0368730   1st Qu.:0.1985910   1st Qu.:0.1448472   1st Qu.:0.001569  
##  Median :0.0714602   Median :0.3117002   Median :0.2948082   Median :0.011803  
##  Mean   :0.1377745   Mean   :0.3260963   Mean   :0.3257928   Mean   :0.039408  
##  3rd Qu.:0.1795132   3rd Qu.:0.4600063   3rd Qu.:0.4966551   3rd Qu.:0.031915  
##  Max.   :0.9991868   Max.   :0.9995977   Max.   :0.9996029   Max.   :0.917085  
##                                                                                
##   COM_B_RATIO        COM_C_RATIO       COM_D_RATIO         COM_E_RATIO      
##  Min.   :0.000000   Min.   :0.00000   Min.   :0.0000000   Min.   :0.000000  
##  1st Qu.:0.000000   1st Qu.:0.03636   1st Qu.:0.0000000   1st Qu.:0.000000  
##  Median :0.000000   Median :0.06590   Median :0.0000000   Median :0.000000  
##  Mean   :0.006019   Mean   :0.07967   Mean   :0.0007847   Mean   :0.002508  
##  3rd Qu.:0.005188   3rd Qu.:0.10593   3rd Qu.:0.0000000   3rd Qu.:0.003226  
##  Max.   :0.333333   Max.   :0.54518   Max.   :0.4444444   Max.   :0.083333  
##                                                                             
##   COM_F_RATIO       COM_G_RATIO       COM_H_RATIO       COM_I_RATIO     
##  Min.   :0.00000   Min.   :0.01789   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0.01389   1st Qu.:0.38980   1st Qu.:0.01562   1st Qu.:0.02128  
##  Median :0.02778   Median :0.46396   Median :0.03757   Median :0.04167  
##  Mean   :0.03130   Mean   :0.47234   Mean   :0.04955   Mean   :0.04567  
##  3rd Qu.:0.04348   3rd Qu.:0.55263   3rd Qu.:0.07052   3rd Qu.:0.06202  
##  Max.   :0.29213   Max.   :0.89091   Max.   :0.43689   Max.   :0.52542  
##                                                                         
##   COM_J_RATIO        COM_K_RATIO        COM_L_RATIO        COM_M_RATIO     
##  Min.   :0.000000   Min.   :0.000000   Min.   :0.000000   Min.   :0.00000  
##  1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.01144  
##  Median :0.007299   Median :0.000000   Median :0.000000   Median :0.02362  
##  Mean   :0.009054   Mean   :0.003933   Mean   :0.005450   Mean   :0.02536  
##  3rd Qu.:0.013982   3rd Qu.:0.006112   3rd Qu.:0.008601   3rd Qu.:0.03659  
##  Max.   :0.417249   Max.   :0.087912   Max.   :0.156863   Max.   :0.24444  
##                                                                            
##   COM_N_RATIO       COM_O_RATIO         COM_P_RATIO       COM_Q_RATIO      
##  Min.   :0.00000   Min.   :0.0001764   Min.   :0.00000   Min.   :0.000000  
##  1st Qu.:0.01802   1st Qu.:0.0058954   1st Qu.:0.01786   1st Qu.:0.006615  
##  Median :0.02924   Median :0.0153846   Median :0.02985   Median :0.019946  
##  Mean   :0.03553   Mean   :0.0277867   Mean   :0.04350   Mean   :0.022028  
##  3rd Qu.:0.04496   3rd Qu.:0.0361664   3rd Qu.:0.04878   3rd Qu.:0.033033  
##  Max.   :0.33527   Max.   :0.3636364   Max.   :0.83673   Max.   :0.214286  
##                                                                            
##   COM_R_RATIO        COM_S_RATIO       COM_U_RATIO        POP_YOUTH_RATIO 
##  Min.   :0.000000   Min.   :0.00000   Min.   :0.000e+00   Min.   :0.1064  
##  1st Qu.:0.000000   1st Qu.:0.04116   1st Qu.:0.000e+00   1st Qu.:0.2153  
##  Median :0.009091   Median :0.06395   Median :0.000e+00   Median :0.2452  
##  Mean   :0.010772   Mean   :0.08933   Mean   :2.036e-06   Mean   :0.2491  
##  3rd Qu.:0.015310   3rd Qu.:0.11147   3rd Qu.:0.000e+00   3rd Qu.:0.2771  
##  Max.   :0.166667   Max.   :0.56716   Max.   :2.985e-03   Max.   :0.4408  
##                                                                           
##     CAPITAL         RURAL_URBANIntermediário Adjacente
##  Min.   :0.000000   Min.   :0.0000                    
##  1st Qu.:0.000000   1st Qu.:0.0000                    
##  Median :0.000000   Median :0.0000                    
##  Mean   :0.004853   Mean   :0.1233                    
##  3rd Qu.:0.000000   3rd Qu.:0.0000                    
##  Max.   :1.000000   Max.   :1.0000                    
##                                                       
##  RURAL_URBANIntermediário Remoto RURAL_URBANRural Adjacente
##  Min.   :0.00000                 Min.   :0.0000            
##  1st Qu.:0.00000                 1st Qu.:0.0000            
##  Median :0.00000                 Median :1.0000            
##  Mean   :0.01078                 Mean   :0.5462            
##  3rd Qu.:0.00000                 3rd Qu.:1.0000            
##  Max.   :1.00000                 Max.   :1.0000            
##                                                            
##  RURAL_URBANRural Remoto RURAL_URBANUrbano
##  Min.   :0.00000         Min.   :0.0000   
##  1st Qu.:0.00000         1st Qu.:0.0000   
##  Median :0.00000         Median :0.0000   
##  Mean   :0.05805         Mean   :0.2617   
##  3rd Qu.:0.00000         3rd Qu.:1.0000   
##  Max.   :1.00000         Max.   :1.0000   
##                                           
##  GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social
##  Min.   :0.0000                                                              
##  1st Qu.:0.0000                                                              
##  Median :0.0000                                                              
##  Mean   :0.4892                                                              
##  3rd Qu.:1.0000                                                              
##  Max.   :1.0000                                                              
##                                                                              
##  GVA_MAINAgricultura, inclusive apoio à agricultura e a pós colheita
##  Min.   :0.0000                                                     
##  1st Qu.:0.0000                                                     
##  Median :0.0000                                                     
##  Mean   :0.1317                                                     
##  3rd Qu.:0.0000                                                     
##  Max.   :1.0000                                                     
##                                                                     
##  GVA_MAINComércio e reparação de veículos automotores e motocicletas
##  Min.   :0.000000                                                   
##  1st Qu.:0.000000                                                   
##  Median :0.000000                                                   
##  Mean   :0.008267                                                   
##  3rd Qu.:0.000000                                                   
##  Max.   :1.000000                                                   
##                                                                     
##  GVA_MAINConstrução GVA_MAINDemais serviços
##  Min.   :0.000000   Min.   :0.0000         
##  1st Qu.:0.000000   1st Qu.:0.0000         
##  Median :0.000000   Median :0.0000         
##  Mean   :0.001258   Mean   :0.2653         
##  3rd Qu.:0.000000   3rd Qu.:1.0000         
##  Max.   :1.000000   Max.   :1.0000         
##                                            
##  GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação
##  Min.   :0.00000                                                                             
##  1st Qu.:0.00000                                                                             
##  Median :0.00000                                                                             
##  Mean   :0.01761                                                                             
##  3rd Qu.:0.00000                                                                             
##  Max.   :1.00000                                                                             
##                                                                                              
##  GVA_MAINIndústrias de transformação GVA_MAINIndústrias extrativas
##  Min.   :0.00000                     Min.   :0.00000              
##  1st Qu.:0.00000                     1st Qu.:0.00000              
##  Median :0.00000                     Median :0.00000              
##  Mean   :0.04691                     Mean   :0.00629              
##  3rd Qu.:0.00000                     3rd Qu.:0.00000              
##  Max.   :1.00000                     Max.   :1.00000              
##                                                                   
##  GVA_MAINPecuária, inclusive apoio à pecuária
##  Min.   :0.00000                             
##  1st Qu.:0.00000                             
##  Median :0.00000                             
##  Mean   :0.02894                             
##  3rd Qu.:0.00000                             
##  Max.   :1.00000                             
##                                              
##  GVA_MAINProdução florestal, pesca e aquicultura      ALT           
##  Min.   :0.000000                                Min.   :0.0000000  
##  1st Qu.:0.000000                                1st Qu.:0.0001941  
##  Median :0.000000                                Median :0.0004648  
##  Mean   :0.004493                                Mean   :0.0010222  
##  3rd Qu.:0.000000                                3rd Qu.:0.0007193  
##  Max.   :1.000000                                Max.   :1.0000000  
##                                                                     
##       AREA               IDHM         POP_DENSITY          COMP_TOT        
##  Min.   :0.000000   Min.   :0.0000   Min.   :0.000000   Min.   :0.0000000  
##  1st Qu.:0.001260   1st Qu.:0.4077   1st Qu.:0.000876   1st Qu.:0.0001169  
##  Median :0.002589   Median :0.5563   Median :0.001864   Median :0.0002941  
##  Mean   :0.009539   Mean   :0.5432   Mean   :0.008659   Mean   :0.0016997  
##  3rd Qu.:0.006412   3rd Qu.:0.6757   3rd Qu.:0.004091   3rd Qu.:0.0008356  
##  Max.   :1.000000   Max.   :1.0000   Max.   :1.000000   Max.   :1.0000000  
##                                                                            
##           geometry   
##  MULTIPOLYGON :5564  
##  epsg:4674    :   0  
##  +proj=long...:   0  
##                      
##                      
##                      
## 

5.2 Building Multi-linear regression model for contributory factors to GDP per capita

When performing a multi-linear regression, we need to define our Null Hypothesis: * NULL Hypothesis: The data is randomly distributed * Alternative Hypothesis: The data is not randomly distributed

We will be selecting a confidence level of 95% for this analysis. Meaning we would need an alpha value below 0.05 in order to reject the null hypothesis

5.2.1 Performing Linear Regression

Because we have Categorical data and data which sums to 1, we will need to decide which one of the following is our baseline:

  • Population Age ratios:
    • We will take the YOUTHs ratio as our baseline
  • GVA Ratios:
    • We will take Public services to be our baseline
  • GVA Main categories:
    • GVA_MAINProdução florestal, pesca e aquicultura
  • Company Type Ratios:
    • COM_U_RATIO
  • Rural or Urban Classifications:
    • RURAL_URBANUrbano
GDPPC.mlr<- lm(GDP_CAPITA ~ ., data=Brazil_Indicators.sf[2:53] %>% st_set_geometry(NULL))
summary(GDPPC.mlr)
## 
## Call:
## lm(formula = GDP_CAPITA ~ ., data = Brazil_Indicators.sf[2:53] %>% 
##     st_set_geometry(NULL))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -40924  -5198   -713   3256 246925 
## 
## Coefficients: (5 not defined because of singularities)
##                                                                                                  Estimate
## (Intercept)                                                                                     5457565.3
## RES_FOREIGN_POP_RATIO                                                                             -5490.9
## DOM_URBAN_RATIO                                                                                   -1953.4
## POP_WORKING_RATIO                                                                                 37961.8
## POP_ELDERLY_RATIO                                                                                -34416.2
## GVA_AGROPEC_RATIO                                                                                  8195.2
## GVA_INDUSTRY_RATIO                                                                                22856.5
## GVA_SERVICES_RATIO                                                                                 4935.0
## GVA_PUBLIC_RATIO                                                                                       NA
## COM_A_RATIO                                                                                    -5474832.5
## COM_B_RATIO                                                                                    -5510719.9
## COM_C_RATIO                                                                                    -5504450.0
## COM_D_RATIO                                                                                    -5436381.1
## COM_E_RATIO                                                                                    -5436230.7
## COM_F_RATIO                                                                                    -5477177.6
## COM_G_RATIO                                                                                    -5479871.4
## COM_H_RATIO                                                                                    -5458575.7
## COM_I_RATIO                                                                                    -5467005.8
## COM_J_RATIO                                                                                    -5467998.6
## COM_K_RATIO                                                                                    -5364519.1
## COM_L_RATIO                                                                                    -5373763.3
## COM_M_RATIO                                                                                    -5460390.8
## COM_N_RATIO                                                                                    -5466424.3
## COM_O_RATIO                                                                                    -5461544.0
## COM_P_RATIO                                                                                    -5478092.3
## COM_Q_RATIO                                                                                    -5487459.3
## COM_R_RATIO                                                                                    -5479679.6
## COM_S_RATIO                                                                                    -5477920.3
## COM_U_RATIO                                                                                            NA
## POP_YOUTH_RATIO                                                                                        NA
## CAPITAL                                                                                           -8485.0
## `RURAL_URBANIntermediário Adjacente`                                                               -370.4
## `RURAL_URBANIntermediário Remoto`                                                                  4839.7
## `RURAL_URBANRural Adjacente`                                                                       1555.1
## `RURAL_URBANRural Remoto`                                                                          4727.9
## RURAL_URBANUrbano                                                                                      NA
## `GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social`                    -9635.6
## `GVA_MAINAgricultura, inclusive apoio à agricultura e a pós colheita`                              2708.8
## `GVA_MAINComércio e reparação de veículos automotores e motocicletas`                             21939.3
## GVA_MAINConstrução                                                                                -7020.6
## `GVA_MAINDemais serviços`                                                                         -8217.5
## `GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação`    19968.9
## `GVA_MAINIndústrias de transformação`                                                             13099.6
## `GVA_MAINIndústrias extrativas`                                                                   14549.4
## `GVA_MAINPecuária, inclusive apoio à pecuária`                                                    -5565.2
## `GVA_MAINProdução florestal, pesca e aquicultura`                                                      NA
## ALT                                                                                               -5481.8
## AREA                                                                                               8959.2
## IDHM                                                                                              36695.3
## POP_DENSITY                                                                                        7831.5
## COMP_TOT                                                                                          36690.6
##                                                                                                Std. Error
## (Intercept)                                                                                     5758288.1
## RES_FOREIGN_POP_RATIO                                                                             56651.7
## DOM_URBAN_RATIO                                                                                    1462.6
## POP_WORKING_RATIO                                                                                 10630.0
## POP_ELDERLY_RATIO                                                                                  7766.5
## GVA_AGROPEC_RATIO                                                                                  1309.1
## GVA_INDUSTRY_RATIO                                                                                 1634.1
## GVA_SERVICES_RATIO                                                                                 1161.4
## GVA_PUBLIC_RATIO                                                                                       NA
## COM_A_RATIO                                                                                     5758215.0
## COM_B_RATIO                                                                                     5758386.0
## COM_C_RATIO                                                                                     5758212.1
## COM_D_RATIO                                                                                     5758474.6
## COM_E_RATIO                                                                                     5758512.8
## COM_F_RATIO                                                                                     5758246.2
## COM_G_RATIO                                                                                     5758265.9
## COM_H_RATIO                                                                                     5758277.5
## COM_I_RATIO                                                                                     5757930.3
## COM_J_RATIO                                                                                     5758076.2
## COM_K_RATIO                                                                                     5757653.4
## COM_L_RATIO                                                                                     5757834.4
## COM_M_RATIO                                                                                     5758161.2
## COM_N_RATIO                                                                                     5757990.4
## COM_O_RATIO                                                                                     5758278.8
## COM_P_RATIO                                                                                     5758244.6
## COM_Q_RATIO                                                                                     5758228.4
## COM_R_RATIO                                                                                     5758306.1
## COM_S_RATIO                                                                                     5758244.3
## COM_U_RATIO                                                                                            NA
## POP_YOUTH_RATIO                                                                                        NA
## CAPITAL                                                                                            3355.8
## `RURAL_URBANIntermediário Adjacente`                                                                738.4
## `RURAL_URBANIntermediário Remoto`                                                                  2083.3
## `RURAL_URBANRural Adjacente`                                                                        695.9
## `RURAL_URBANRural Remoto`                                                                          1085.4
## RURAL_URBANUrbano                                                                                      NA
## `GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social`                     2983.5
## `GVA_MAINAgricultura, inclusive apoio à agricultura e a pós colheita`                              2991.4
## `GVA_MAINComércio e reparação de veículos automotores e motocicletas`                              3698.9
## GVA_MAINConstrução                                                                                 6278.1
## `GVA_MAINDemais serviços`                                                                          3025.8
## `GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação`     3389.8
## `GVA_MAINIndústrias de transformação`                                                              3168.8
## `GVA_MAINIndústrias extrativas`                                                                    3923.8
## `GVA_MAINPecuária, inclusive apoio à pecuária`                                                     3170.5
## `GVA_MAINProdução florestal, pesca e aquicultura`                                                      NA
## ALT                                                                                                9989.1
## AREA                                                                                               6234.3
## IDHM                                                                                               2842.5
## POP_DENSITY                                                                                        4947.6
## COMP_TOT                                                                                          14889.5
##                                                                                                t value
## (Intercept)                                                                                      0.948
## RES_FOREIGN_POP_RATIO                                                                           -0.097
## DOM_URBAN_RATIO                                                                                 -1.336
## POP_WORKING_RATIO                                                                                3.571
## POP_ELDERLY_RATIO                                                                               -4.431
## GVA_AGROPEC_RATIO                                                                                6.260
## GVA_INDUSTRY_RATIO                                                                              13.987
## GVA_SERVICES_RATIO                                                                               4.249
## GVA_PUBLIC_RATIO                                                                                    NA
## COM_A_RATIO                                                                                     -0.951
## COM_B_RATIO                                                                                     -0.957
## COM_C_RATIO                                                                                     -0.956
## COM_D_RATIO                                                                                     -0.944
## COM_E_RATIO                                                                                     -0.944
## COM_F_RATIO                                                                                     -0.951
## COM_G_RATIO                                                                                     -0.952
## COM_H_RATIO                                                                                     -0.948
## COM_I_RATIO                                                                                     -0.949
## COM_J_RATIO                                                                                     -0.950
## COM_K_RATIO                                                                                     -0.932
## COM_L_RATIO                                                                                     -0.933
## COM_M_RATIO                                                                                     -0.948
## COM_N_RATIO                                                                                     -0.949
## COM_O_RATIO                                                                                     -0.948
## COM_P_RATIO                                                                                     -0.951
## COM_Q_RATIO                                                                                     -0.953
## COM_R_RATIO                                                                                     -0.952
## COM_S_RATIO                                                                                     -0.951
## COM_U_RATIO                                                                                         NA
## POP_YOUTH_RATIO                                                                                     NA
## CAPITAL                                                                                         -2.528
## `RURAL_URBANIntermediário Adjacente`                                                            -0.502
## `RURAL_URBANIntermediário Remoto`                                                                2.323
## `RURAL_URBANRural Adjacente`                                                                     2.235
## `RURAL_URBANRural Remoto`                                                                        4.356
## RURAL_URBANUrbano                                                                                   NA
## `GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social`                  -3.230
## `GVA_MAINAgricultura, inclusive apoio à agricultura e a pós colheita`                            0.906
## `GVA_MAINComércio e reparação de veículos automotores e motocicletas`                            5.931
## GVA_MAINConstrução                                                                              -1.118
## `GVA_MAINDemais serviços`                                                                       -2.716
## `GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação`   5.891
## `GVA_MAINIndústrias de transformação`                                                            4.134
## `GVA_MAINIndústrias extrativas`                                                                  3.708
## `GVA_MAINPecuária, inclusive apoio à pecuária`                                                  -1.755
## `GVA_MAINProdução florestal, pesca e aquicultura`                                                   NA
## ALT                                                                                             -0.549
## AREA                                                                                             1.437
## IDHM                                                                                            12.910
## POP_DENSITY                                                                                      1.583
## COMP_TOT                                                                                         2.464
##                                                                                                Pr(>|t|)
## (Intercept)                                                                                    0.343285
## RES_FOREIGN_POP_RATIO                                                                          0.922791
## DOM_URBAN_RATIO                                                                                0.181731
## POP_WORKING_RATIO                                                                              0.000358
## POP_ELDERLY_RATIO                                                                              9.55e-06
## GVA_AGROPEC_RATIO                                                                              4.14e-10
## GVA_INDUSTRY_RATIO                                                                              < 2e-16
## GVA_SERVICES_RATIO                                                                             2.18e-05
## GVA_PUBLIC_RATIO                                                                                     NA
## COM_A_RATIO                                                                                    0.341754
## COM_B_RATIO                                                                                    0.338614
## COM_C_RATIO                                                                                    0.339149
## COM_D_RATIO                                                                                    0.345177
## COM_E_RATIO                                                                                    0.345194
## COM_F_RATIO                                                                                    0.341550
## COM_G_RATIO                                                                                    0.341315
## COM_H_RATIO                                                                                    0.343195
## COM_I_RATIO                                                                                    0.342421
## COM_J_RATIO                                                                                    0.342346
## COM_K_RATIO                                                                                    0.351522
## COM_L_RATIO                                                                                    0.350708
## COM_M_RATIO                                                                                    0.343025
## COM_N_RATIO                                                                                    0.342477
## COM_O_RATIO                                                                                    0.342933
## COM_P_RATIO                                                                                    0.341470
## COM_Q_RATIO                                                                                    0.340643
## COM_R_RATIO                                                                                    0.341335
## COM_S_RATIO                                                                                    0.341485
## COM_U_RATIO                                                                                          NA
## POP_YOUTH_RATIO                                                                                      NA
## CAPITAL                                                                                        0.011484
## `RURAL_URBANIntermediário Adjacente`                                                           0.615964
## `RURAL_URBANIntermediário Remoto`                                                              0.020211
## `RURAL_URBANRural Adjacente`                                                                   0.025484
## `RURAL_URBANRural Remoto`                                                                      1.35e-05
## RURAL_URBANUrbano                                                                                    NA
## `GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social`                 0.001247
## `GVA_MAINAgricultura, inclusive apoio à agricultura e a pós colheita`                          0.365227
## `GVA_MAINComércio e reparação de veículos automotores e motocicletas`                          3.19e-09
## GVA_MAINConstrução                                                                             0.263506
## `GVA_MAINDemais serviços`                                                                      0.006632
## `GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação` 4.07e-09
## `GVA_MAINIndústrias de transformação`                                                          3.62e-05
## `GVA_MAINIndústrias extrativas`                                                                0.000211
## `GVA_MAINPecuária, inclusive apoio à pecuária`                                                 0.079260
## `GVA_MAINProdução florestal, pesca e aquicultura`                                                    NA
## ALT                                                                                            0.583182
## AREA                                                                                           0.150749
## IDHM                                                                                            < 2e-16
## POP_DENSITY                                                                                    0.113508
## COMP_TOT                                                                                       0.013763
##                                                                                                   
## (Intercept)                                                                                       
## RES_FOREIGN_POP_RATIO                                                                             
## DOM_URBAN_RATIO                                                                                   
## POP_WORKING_RATIO                                                                              ***
## POP_ELDERLY_RATIO                                                                              ***
## GVA_AGROPEC_RATIO                                                                              ***
## GVA_INDUSTRY_RATIO                                                                             ***
## GVA_SERVICES_RATIO                                                                             ***
## GVA_PUBLIC_RATIO                                                                                  
## COM_A_RATIO                                                                                       
## COM_B_RATIO                                                                                       
## COM_C_RATIO                                                                                       
## COM_D_RATIO                                                                                       
## COM_E_RATIO                                                                                       
## COM_F_RATIO                                                                                       
## COM_G_RATIO                                                                                       
## COM_H_RATIO                                                                                       
## COM_I_RATIO                                                                                       
## COM_J_RATIO                                                                                       
## COM_K_RATIO                                                                                       
## COM_L_RATIO                                                                                       
## COM_M_RATIO                                                                                       
## COM_N_RATIO                                                                                       
## COM_O_RATIO                                                                                       
## COM_P_RATIO                                                                                       
## COM_Q_RATIO                                                                                       
## COM_R_RATIO                                                                                       
## COM_S_RATIO                                                                                       
## COM_U_RATIO                                                                                       
## POP_YOUTH_RATIO                                                                                   
## CAPITAL                                                                                        *  
## `RURAL_URBANIntermediário Adjacente`                                                              
## `RURAL_URBANIntermediário Remoto`                                                              *  
## `RURAL_URBANRural Adjacente`                                                                   *  
## `RURAL_URBANRural Remoto`                                                                      ***
## RURAL_URBANUrbano                                                                                 
## `GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social`                 ** 
## `GVA_MAINAgricultura, inclusive apoio à agricultura e a pós colheita`                             
## `GVA_MAINComércio e reparação de veículos automotores e motocicletas`                          ***
## GVA_MAINConstrução                                                                                
## `GVA_MAINDemais serviços`                                                                      ** 
## `GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação` ***
## `GVA_MAINIndústrias de transformação`                                                          ***
## `GVA_MAINIndústrias extrativas`                                                                ***
## `GVA_MAINPecuária, inclusive apoio à pecuária`                                                 .  
## `GVA_MAINProdução florestal, pesca e aquicultura`                                                 
## ALT                                                                                               
## AREA                                                                                              
## IDHM                                                                                           ***
## POP_DENSITY                                                                                       
## COMP_TOT                                                                                       *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14600 on 5518 degrees of freedom
## Multiple R-squared:  0.488,  Adjusted R-squared:  0.4838 
## F-statistic: 116.9 on 45 and 5518 DF,  p-value: < 2.2e-16

5.2.2 Interpretation of Regression

Based on the F-statistic, it seems our model has a p-value less than 0.05 which means that the goodness of fit for the model is significant to reject the null hypothesis which is that the rate of change in the dependent variable is explainable by the mean.

It would seem that the company type ratios do not contribute signifcantly to GDP per Capita. Addtionally, the altitude and size of the municipality also show not significance. The same is seen for population density, ratio of foreigners in the population and percentage of urbanized households. There are some GVA main categories which are also not statistically significant which we will remove. Lastly the Urban or Rural classifications seem to have some significance except for Intermediário Remoto which is likely because the definition is very inbetween many of the othse.

5.2.3 Selecting Significant Indicators

Brazil_sig_Indic.sf <- Brazil_Indicators.sf %>% select("CITY_STATE", "GDP_CAPITA", "POP_WORKING_RATIO", "POP_ELDERLY_RATIO","GVA_AGROPEC_RATIO", "GVA_INDUSTRY_RATIO", "GVA_SERVICES_RATIO", "CAPITAL", "RURAL_URBANIntermediário Adjacente", "RURAL_URBANIntermediário Remoto", "RURAL_URBANRural Adjacente", "RURAL_URBANRural Remoto", "GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social", "GVA_MAINComércio e reparação de veículos automotores e motocicletas", "GVA_MAINDemais serviços", "GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação", "GVA_MAINIndústrias de transformação", "GVA_MAINIndústrias extrativas", "IDHM", "COMP_TOT")

5.2.4 Running Regression on Significant Indicators

GDPPC_sig.mlr<- lm(GDP_CAPITA ~ ., data=Brazil_sig_Indic.sf[2:21] %>% st_set_geometry(NULL))
summary(GDPPC_sig.mlr)
## 
## Call:
## lm(formula = GDP_CAPITA ~ ., data = Brazil_sig_Indic.sf[2:21] %>% 
##     st_set_geometry(NULL))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -42585  -5379   -942   3078 252473 
## 
## Coefficients:
##                                                                                                Estimate
## (Intercept)                                                                                    -13666.5
## POP_WORKING_RATIO                                                                               28125.7
## POP_ELDERLY_RATIO                                                                              -43172.8
## GVA_AGROPEC_RATIO                                                                                8705.5
## GVA_INDUSTRY_RATIO                                                                              22762.2
## GVA_SERVICES_RATIO                                                                               5337.2
## CAPITAL                                                                                         -4722.8
## `RURAL_URBANIntermediário Adjacente`                                                            -1085.7
## `RURAL_URBANIntermediário Remoto`                                                                4783.1
## `RURAL_URBANRural Adjacente`                                                                     1129.0
## `RURAL_URBANRural Remoto`                                                                        4452.0
## `GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social`                 -11208.5
## `GVA_MAINComércio e reparação de veículos automotores e motocicletas`                           21417.8
## `GVA_MAINDemais serviços`                                                                       -9095.2
## `GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação`  19564.9
## `GVA_MAINIndústrias de transformação`                                                           11273.0
## `GVA_MAINIndústrias extrativas`                                                                 15435.8
## IDHM                                                                                            39405.4
## COMP_TOT                                                                                        55830.7
##                                                                                                Std. Error
## (Intercept)                                                                                        6122.4
## POP_WORKING_RATIO                                                                                 10242.9
## POP_ELDERLY_RATIO                                                                                  7379.9
## GVA_AGROPEC_RATIO                                                                                  1296.0
## GVA_INDUSTRY_RATIO                                                                                 1636.9
## GVA_SERVICES_RATIO                                                                                 1153.6
## CAPITAL                                                                                            3279.7
## `RURAL_URBANIntermediário Adjacente`                                                                732.7
## `RURAL_URBANIntermediário Remoto`                                                                  2012.5
## `RURAL_URBANRural Adjacente`                                                                        627.3
## `RURAL_URBANRural Remoto`                                                                          1037.5
## `GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social`                      707.0
## `GVA_MAINComércio e reparação de veículos automotores e motocicletas`                              2295.3
## `GVA_MAINDemais serviços`                                                                           774.5
## `GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação`     1733.3
## `GVA_MAINIndústrias de transformação`                                                              1222.3
## `GVA_MAINIndústrias extrativas`                                                                    2652.6
## IDHM                                                                                               2416.5
## COMP_TOT                                                                                          14553.1
##                                                                                                t value
## (Intercept)                                                                                     -2.232
## POP_WORKING_RATIO                                                                                2.746
## POP_ELDERLY_RATIO                                                                               -5.850
## GVA_AGROPEC_RATIO                                                                                6.717
## GVA_INDUSTRY_RATIO                                                                              13.906
## GVA_SERVICES_RATIO                                                                               4.627
## CAPITAL                                                                                         -1.440
## `RURAL_URBANIntermediário Adjacente`                                                            -1.482
## `RURAL_URBANIntermediário Remoto`                                                                2.377
## `RURAL_URBANRural Adjacente`                                                                     1.800
## `RURAL_URBANRural Remoto`                                                                        4.291
## `GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social`                 -15.853
## `GVA_MAINComércio e reparação de veículos automotores e motocicletas`                            9.331
## `GVA_MAINDemais serviços`                                                                      -11.743
## `GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação`  11.288
## `GVA_MAINIndústrias de transformação`                                                            9.223
## `GVA_MAINIndústrias extrativas`                                                                  5.819
## IDHM                                                                                            16.307
## COMP_TOT                                                                                         3.836
##                                                                                                Pr(>|t|)
## (Intercept)                                                                                    0.025639
## POP_WORKING_RATIO                                                                              0.006054
## POP_ELDERLY_RATIO                                                                              5.19e-09
## GVA_AGROPEC_RATIO                                                                              2.04e-11
## GVA_INDUSTRY_RATIO                                                                              < 2e-16
## GVA_SERVICES_RATIO                                                                             3.80e-06
## CAPITAL                                                                                        0.149924
## `RURAL_URBANIntermediário Adjacente`                                                           0.138455
## `RURAL_URBANIntermediário Remoto`                                                              0.017506
## `RURAL_URBANRural Adjacente`                                                                   0.071941
## `RURAL_URBANRural Remoto`                                                                      1.81e-05
## `GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social`                  < 2e-16
## `GVA_MAINComércio e reparação de veículos automotores e motocicletas`                           < 2e-16
## `GVA_MAINDemais serviços`                                                                       < 2e-16
## `GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação`  < 2e-16
## `GVA_MAINIndústrias de transformação`                                                           < 2e-16
## `GVA_MAINIndústrias extrativas`                                                                6.25e-09
## IDHM                                                                                            < 2e-16
## COMP_TOT                                                                                       0.000126
##                                                                                                   
## (Intercept)                                                                                    *  
## POP_WORKING_RATIO                                                                              ** 
## POP_ELDERLY_RATIO                                                                              ***
## GVA_AGROPEC_RATIO                                                                              ***
## GVA_INDUSTRY_RATIO                                                                             ***
## GVA_SERVICES_RATIO                                                                             ***
## CAPITAL                                                                                           
## `RURAL_URBANIntermediário Adjacente`                                                              
## `RURAL_URBANIntermediário Remoto`                                                              *  
## `RURAL_URBANRural Adjacente`                                                                   .  
## `RURAL_URBANRural Remoto`                                                                      ***
## `GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social`                 ***
## `GVA_MAINComércio e reparação de veículos automotores e motocicletas`                          ***
## `GVA_MAINDemais serviços`                                                                      ***
## `GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação` ***
## `GVA_MAINIndústrias de transformação`                                                          ***
## `GVA_MAINIndústrias extrativas`                                                                ***
## IDHM                                                                                           ***
## COMP_TOT                                                                                       ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14840 on 5545 degrees of freedom
## Multiple R-squared:  0.4685, Adjusted R-squared:  0.4668 
## F-statistic: 271.5 on 18 and 5545 DF,  p-value: < 2.2e-16

Based on our new regression, we can see some of the variables have become insignificant, Notably the CAPITAL classification and Rural Intermediate or Urban classifications for Adjacente have also become insigifcant. We will run the regression again without them.

5.2.4.1 Removing newly non significant figures

dropsInsig <- c("CAPITAL", "RURAL_URBANIntermediário Adjacente", "RURAL_URBANRural Adjacente")

Brazil_sig_Indic.sf <- Brazil_sig_Indic.sf[ , !(names(Brazil_sig_Indic.sf) %in% dropsInsig)] 

5.2.4.2 Rereunning the regression

5.2.4.3 Renamming Variables for further processing

names(Brazil_sig_Indic.sf)[names(Brazil_sig_Indic.sf) == 'RURAL_URBANIntermediário Remoto'] <- 'CAT_INTERMEDIATE_REMOTE'
names(Brazil_sig_Indic.sf)[names(Brazil_sig_Indic.sf) == 'RURAL_URBANRural Remoto'] <- 'CAT_RURAL_REMOTE'
names(Brazil_sig_Indic.sf)[names(Brazil_sig_Indic.sf) == 'GVA_MAINAdministração, defesa, educação e saúde públicas e seguridade social'] <- 'GVA_MAIN_Public_Sector'
names(Brazil_sig_Indic.sf)[names(Brazil_sig_Indic.sf) == 'GVA_MAINComércio e reparação de veículos automotores e motocicletas'] <- 'GVA_MAIN_Commercial'
names(Brazil_sig_Indic.sf)[names(Brazil_sig_Indic.sf) == 'GVA_MAINDemais serviços'] <- 'GVA_MAIN_Other_services'
names(Brazil_sig_Indic.sf)[names(Brazil_sig_Indic.sf) == 'GVA_MAINEletricidade e gás, água, esgoto, atividades de gestão de resíduos e descontaminação'] <- 'GVA_MAIN_Public_Utilities'
names(Brazil_sig_Indic.sf)[names(Brazil_sig_Indic.sf) == 'GVA_MAINIndústrias de transformação'] <- 'GVA_MAIN_Industry_transformation'
names(Brazil_sig_Indic.sf)[names(Brazil_sig_Indic.sf) == 'GVA_MAINIndústrias extrativas'] <- 'GVA_MAIN_Industrial'
GDPPC_sig2.mlr<- lm(GDP_CAPITA ~ ., data=Brazil_sig_Indic.sf[2:18] %>% st_set_geometry(NULL))
summary(GDPPC_sig2.mlr)
## 
## Call:
## lm(formula = GDP_CAPITA ~ ., data = Brazil_sig_Indic.sf[2:18] %>% 
##     st_set_geometry(NULL))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -42091  -5367   -884   3055 252671 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      -13922.8     6105.4  -2.280 0.022622 *  
## POP_WORKING_RATIO                 29840.1    10241.3   2.914 0.003586 ** 
## POP_ELDERLY_RATIO                -37947.9     6994.2  -5.426 6.02e-08 ***
## GVA_AGROPEC_RATIO                  9008.2     1287.0   6.999 2.88e-12 ***
## GVA_INDUSTRY_RATIO                22507.7     1633.3  13.780  < 2e-16 ***
## GVA_SERVICES_RATIO                 4989.7     1148.6   4.344 1.42e-05 ***
## CAT_INTERMEDIATE_REMOTE            4304.2     1967.3   2.188 0.028725 *  
## CAT_RURAL_REMOTE                   3815.4      908.6   4.199 2.72e-05 ***
## GVA_MAIN_Public_Sector           -11314.6      706.3 -16.019  < 2e-16 ***
## GVA_MAIN_Commercial               21167.5     2293.4   9.230  < 2e-16 ***
## GVA_MAIN_Other_services           -9509.9      751.3 -12.657  < 2e-16 ***
## GVA_MAIN_Public_Utilities         19421.8     1734.1  11.200  < 2e-16 ***
## GVA_MAIN_Industry_transformation  11100.7     1220.7   9.094  < 2e-16 ***
## GVA_MAIN_Industrial               15348.7     2654.9   5.781 7.82e-09 ***
## IDHM                              38163.3     2351.1  16.232  < 2e-16 ***
## COMP_TOT                          46378.4    12894.6   3.597 0.000325 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14850 on 5548 degrees of freedom
## Multiple R-squared:  0.4671, Adjusted R-squared:  0.4657 
## F-statistic: 324.2 on 15 and 5548 DF,  p-value: < 2.2e-16

Now we can see the final regression, we have an adjusted R-square value of 0.4657 which is quite low which means there the majority of varation in GDP per capita are still unexplained in our model. We’ve seen the Adjusted R-squared value decrease as we continue to refine our model. The F-statistic still shows that the model is still able to reject the null hypothesis that the mean is better at explaining the rate of change in the dependent variable.

5.2.5 Intepretation of results

As per our regression above which we will validate below, we can see that the variables have a certain impact on GDP per capita. Unsuprisingly, the total number of companies significantly correlates to the GDP per capita. This is most probably due to there being more jobs and therefore more people are able to be employed. Though if we wanted to investigate further, we could examine if the ratio of Companies to Population could have an effect on GDP per capita.

The working population ratio has a positive correlation while the elderly ratio has a negative correlation. This is in line with the logic that the more economically active population percentages contribute to GDP per capita where as the higher dependents in the Elderly results in lower GDP per capita. For our Gross Value Added ratio by industry, it seems most of them contribute positively to GDP per capita, however the Industrial companies seem to contribute greater by a large amount compared to the other two. This is most probably due to the way in which GDP per capita is calculated and amnufacturing sectors contributing more to it than others.

the IDHM which is our Human Development Index seems to also be positively correlated to GDP per capita. However, it is not certain if this is a causal relationship has it might have reverse causality. This is because GDP per capita often leads to greater outcomes in life. But because this data was recorded in 2010 and the GDP per capita is in 2016, we can safetly say that a higher HDI might lead to greater GDP per capita for the people.

In terms of our categorical variables, it seems that being clusified as a Rural or Intermediate Remote region is positively correlated with higher GDP per capita. This sort of matches our choropleth map that showed the inland areas with higher GDP per capita compared to what you would think is more urbanized areas. This could be due to a lower population in these remote areas and more focus on industrial or manufacturing jobs whihc could be contributing to this.

Interestingly the labeling of main sector for Gross Value added shows that areas in which their main sector is Public services such as Public administration, defense, education and health and social security actually correlates less with GDP per Capita. This may be due to municipalities being specialized for certain government functions. Other services also follows the same negative correlation however it is not clear why this is the case. As expected, the places with main economic activities being commercial correlate the most to GDP per capita but suprisingly public utilities such as electricity and gas, water, sewage, waste management and decontamination activities comes in close as well beating out industrial and industrial transformation labelled municipalities.

5.2.6 Clearling Redundant explainatory variables

VIF <- ols_vif_tol(GDPPC_sig2.mlr)
VIF
##                           Variables Tolerance      VIF
## 1                 POP_WORKING_RATIO 0.3671431 2.723734
## 2                 POP_ELDERLY_RATIO 0.7191852 1.390463
## 3                 GVA_AGROPEC_RATIO 0.5633825 1.774993
## 4                GVA_INDUSTRY_RATIO 0.5075373 1.970299
## 5                GVA_SERVICES_RATIO 0.6316755 1.583091
## 6           CAT_INTERMEDIATE_REMOTE 0.9602515 1.041394
## 7                  CAT_RURAL_REMOTE 0.8783178 1.138540
## 8            GVA_MAIN_Public_Sector 0.3180169 3.144487
## 9               GVA_MAIN_Commercial 0.9193246 1.087755
## 10          GVA_MAIN_Other_services 0.3603482 2.775094
## 11        GVA_MAIN_Public_Utilities 0.7619520 1.312419
## 12 GVA_MAIN_Industry_transformation 0.5950853 1.680431
## 13              GVA_MAIN_Industrial 0.8998277 1.111324
## 14                             IDHM 0.2730812 3.661915
## 15                         COMP_TOT 0.9651462 1.036112

As we can see from our VIF analysis, all our variables are non-redundant as cleared by the correlational analysis done earlier.

5.2.7 Testing for Non-Linearity in model

ols_plot_resid_fit(GDPPC_sig2.mlr)

From the data, we plot above we can see that the data is relatively scattered around the mean. This means that the model passes the linearity assumption required in the multi-linear regression analysis. Additionally, there does not seem to be any obvious signs of heteroscadicity in the plot above.

5.2.8 Test for Normality Assumption

ols_plot_resid_hist(GDPPC_sig2.mlr)

The figure reveals that the residual of the multiple linear regression model resembles a normal distribution which passes the Normality Assumption. We would normally use ols_test_normality() to further test this assumption. But the function is limtied to sample sizes between 3 to 5000 and we have 5564 observations, thus we will skip this step as we have sufficient evidence from the plot that it passes normality test.

5.3 Testing for Spatial Autocorrelation

The model we built is using geographically referenced attributes, hence it is also important for us to visualize the residuals of the model in order to rule out spatial autocorrelation.

mlr.output <- as.data.frame(GDPPC_sig2.mlr$residuals)
Brazil_residual.sf <- cbind(Brazil_sig_Indic.sf, 
                        GDPPC_sig2.mlr$residuals) %>%
rename(`MLR_RES` = `GDPPC_sig2.mlr.residuals`)

5.3.1 Plotting Choropleth Map of GDP per Capita Residuals

tmap_mode("plot")
tm_shape(Brazil_residual.sf)+
  tm_fill("MLR_RES",
          n = 6,
          style = "quantile",
          palette = "RdYlBu" ) +
  tm_borders(alpha = 0.5)

From our mapping of residuals, there isn’t a clear sign on whether or not it is clustered in any way or if theres a geospatial pattern in distribution. However, we can test this using the Moran’s I test.

5.3.2 Building Nearest Neighbours matrix

For this, we will be using the spatial points of the actual municipality since we have them already. We will assume the indexing has no real change as well as we had not done any form of sorting.

Brazil_cities.sp <- as_Spatial(Brazil_cities.sf)
#st_crs(Brazil_cities.sf)
proj4string(Brazil_cities.sp)
## [1] "+proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs"

5.3.2.1 Calculating maximum distance between points

coords <- coordinates(Brazil_cities.sp)
k1 <- knn2nb(knearneigh(coords))
k1dists <- unlist(nbdists(k1, coords, longlat = TRUE))
summary(k1dists)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   0.6029   9.1046  13.1276  17.0081  19.7337 363.0083
nb <- dnearneigh(coordinates(Brazil_cities.sp), 0, 364, longlat = TRUE)
nb_lw <- nb2listw(nb, style = 'W')
lm.morantest(GDPPC_sig2.mlr, nb_lw)
## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = GDP_CAPITA ~ ., data = Brazil_sig_Indic.sf[2:18]
## %>% st_set_geometry(NULL))
## weights: nb_lw
## 
## Moran I statistic standard deviate = 0.69146, p-value = 0.2446
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##     6.391678e-04    -1.799828e-04     1.403444e-06

Based on our global Moran’s I test, we can see that the P-value is above 0.05 which means we are unable to reject the Null hypothesis that the values are randomly distributed. Showing that there is no spatial autocorrelation between the residuals which means that our data is cleared of any spatial autocorrelation in the regression. This allows us to trust the correlations in our model a little better.

6 Building an explanatory for GDP per capita Model using GWmodel

We will try to refine our regression using the GWModel

6.1 Joining Variables to spatial data points

Joint_sf <- left_join(Brazil_cities.sf[,1], Brazil_sig_Indic.sf %>% st_set_geometry(NULL))
Joint_sp <- as_Spatial(Joint_sf)
summary(Joint_sp@data)
##                   CITY_STATE     GDP_CAPITA     POP_WORKING_RATIO
##  Abadia De Goiás_GO    :   1   Min.   :  3191   Min.   :0.4716   
##  Abadia Dos Dourados_MG:   1   1st Qu.:  9062   1st Qu.:0.6087   
##  Abadiânia_GO          :   1   Median : 15870   Median :0.6325   
##  Abaeté_MG             :   1   Mean   : 21122   Mean   :0.6308   
##  Abaetetuba_PA         :   1   3rd Qu.: 26155   3rd Qu.:0.6543   
##  Abaiara_CE            :   1   Max.   :314638   Max.   :0.7448   
##  (Other)               :5558                                     
##  POP_ELDERLY_RATIO GVA_AGROPEC_RATIO GVA_INDUSTRY_RATIO  GVA_SERVICES_RATIO 
##  Min.   :0.02255   Min.   :0.00000   Min.   :0.0000157   Min.   :0.0000461  
##  1st Qu.:0.09799   1st Qu.:0.03364   1st Qu.:0.0368730   1st Qu.:0.1985910  
##  Median :0.11921   Median :0.15062   Median :0.0714602   Median :0.3117002  
##  Mean   :0.12009   Mean   :0.21034   Mean   :0.1377745   Mean   :0.3260963  
##  3rd Qu.:0.14103   3rd Qu.:0.34094   3rd Qu.:0.1795132   3rd Qu.:0.4600063  
##  Max.   :0.42199   Max.   :0.99877   Max.   :0.9991868   Max.   :0.9995977  
##                                                                             
##  CAT_INTERMEDIATE_REMOTE CAT_RURAL_REMOTE  GVA_MAIN_Public_Sector
##  Min.   :0.00000         Min.   :0.00000   Min.   :0.0000        
##  1st Qu.:0.00000         1st Qu.:0.00000   1st Qu.:0.0000        
##  Median :0.00000         Median :0.00000   Median :0.0000        
##  Mean   :0.01078         Mean   :0.05805   Mean   :0.4892        
##  3rd Qu.:0.00000         3rd Qu.:0.00000   3rd Qu.:1.0000        
##  Max.   :1.00000         Max.   :1.00000   Max.   :1.0000        
##                                                                  
##  GVA_MAIN_Commercial GVA_MAIN_Other_services GVA_MAIN_Public_Utilities
##  Min.   :0.000000    Min.   :0.0000          Min.   :0.00000          
##  1st Qu.:0.000000    1st Qu.:0.0000          1st Qu.:0.00000          
##  Median :0.000000    Median :0.0000          Median :0.00000          
##  Mean   :0.008267    Mean   :0.2653          Mean   :0.01761          
##  3rd Qu.:0.000000    3rd Qu.:1.0000          3rd Qu.:0.00000          
##  Max.   :1.000000    Max.   :1.0000          Max.   :1.00000          
##                                                                       
##  GVA_MAIN_Industry_transformation GVA_MAIN_Industrial      IDHM       
##  Min.   :0.00000                  Min.   :0.00000     Min.   :0.0000  
##  1st Qu.:0.00000                  1st Qu.:0.00000     1st Qu.:0.4077  
##  Median :0.00000                  Median :0.00000     Median :0.5563  
##  Mean   :0.04691                  Mean   :0.00629     Mean   :0.5432  
##  3rd Qu.:0.00000                  3rd Qu.:0.00000     3rd Qu.:0.6757  
##  Max.   :1.00000                  Max.   :1.00000     Max.   :1.0000  
##                                                                       
##     COMP_TOT        
##  Min.   :0.0000000  
##  1st Qu.:0.0001169  
##  Median :0.0002941  
##  Mean   :0.0016997  
##  3rd Qu.:0.0008356  
##  Max.   :1.0000000  
## 

##Building Fixed Bandwidth GWR Mode We will be using an Fixed bandwith here due to the varying nature of the polygons in Brazil

#bw.fixed <- bw.gwr(formula = GDP_CAPITA ~  POP_WORKING_RATIO + POP_ELDERLY_RATIO + GVA_AGROPEC_RATIO + GVA_INDUSTRY_RATIO + GVA_SERVICES_RATIO + CAT_INTERMEDIATE_REMOTE + CAT_RURAL_REMOTE + GVA_MAIN_Public_Sector + GVA_MAIN_Commercial + GVA_MAIN_Other_services+ GVA_MAIN_Public_Utilities + GVA_MAIN_Industry_transformation +  GVA_MAIN_Industrial + IDHM + COMP_TOT, data=Joint_sp, approach= "AIC", kernel="gaussian", adaptive=FALSE, longlat=TRUE)

# Could not resolve the issue

Taking the bandwidth established earlier

gwr.fixed <- gwr.basic(formula = GDP_CAPITA ~  POP_WORKING_RATIO + POP_ELDERLY_RATIO + GVA_AGROPEC_RATIO + GVA_INDUSTRY_RATIO + GVA_SERVICES_RATIO + CAT_INTERMEDIATE_REMOTE + CAT_RURAL_REMOTE + GVA_MAIN_Public_Sector + GVA_MAIN_Commercial + GVA_MAIN_Other_services+ GVA_MAIN_Public_Utilities + GVA_MAIN_Industry_transformation +  GVA_MAIN_Industrial + IDHM + COMP_TOT, data=Joint_sp, bw=364, kernel = 'gaussian', longlat = TRUE)

gwr.fixed
##    ***********************************************************************
##    *                       Package   GWmodel                             *
##    ***********************************************************************
##    Program starts at: 2020-06-01 00:21:15 
##    Call:
##    gwr.basic(formula = GDP_CAPITA ~ POP_WORKING_RATIO + POP_ELDERLY_RATIO + 
##     GVA_AGROPEC_RATIO + GVA_INDUSTRY_RATIO + GVA_SERVICES_RATIO + 
##     CAT_INTERMEDIATE_REMOTE + CAT_RURAL_REMOTE + GVA_MAIN_Public_Sector + 
##     GVA_MAIN_Commercial + GVA_MAIN_Other_services + GVA_MAIN_Public_Utilities + 
##     GVA_MAIN_Industry_transformation + GVA_MAIN_Industrial + 
##     IDHM + COMP_TOT, data = Joint_sp, bw = 364, kernel = "gaussian", 
##     longlat = TRUE)
## 
##    Dependent (y) variable:  GDP_CAPITA
##    Independent variables:  POP_WORKING_RATIO POP_ELDERLY_RATIO GVA_AGROPEC_RATIO GVA_INDUSTRY_RATIO GVA_SERVICES_RATIO CAT_INTERMEDIATE_REMOTE CAT_RURAL_REMOTE GVA_MAIN_Public_Sector GVA_MAIN_Commercial GVA_MAIN_Other_services GVA_MAIN_Public_Utilities GVA_MAIN_Industry_transformation GVA_MAIN_Industrial IDHM COMP_TOT
##    Number of data points: 5564
##    ***********************************************************************
##    *                    Results of Global Regression                     *
##    ***********************************************************************
## 
##    Call:
##     lm(formula = formula, data = data)
## 
##    Residuals:
##    Min     1Q Median     3Q    Max 
## -42091  -5367   -884   3055 252671 
## 
##    Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
##    (Intercept)                      -13922.8     6105.4  -2.280 0.022622 *  
##    POP_WORKING_RATIO                 29840.1    10241.3   2.914 0.003586 ** 
##    POP_ELDERLY_RATIO                -37947.9     6994.2  -5.426 6.02e-08 ***
##    GVA_AGROPEC_RATIO                  9008.2     1287.0   6.999 2.88e-12 ***
##    GVA_INDUSTRY_RATIO                22507.7     1633.3  13.780  < 2e-16 ***
##    GVA_SERVICES_RATIO                 4989.7     1148.6   4.344 1.42e-05 ***
##    CAT_INTERMEDIATE_REMOTE            4304.2     1967.3   2.188 0.028725 *  
##    CAT_RURAL_REMOTE                   3815.4      908.6   4.199 2.72e-05 ***
##    GVA_MAIN_Public_Sector           -11314.6      706.3 -16.019  < 2e-16 ***
##    GVA_MAIN_Commercial               21167.5     2293.4   9.230  < 2e-16 ***
##    GVA_MAIN_Other_services           -9509.9      751.3 -12.657  < 2e-16 ***
##    GVA_MAIN_Public_Utilities         19421.8     1734.1  11.200  < 2e-16 ***
##    GVA_MAIN_Industry_transformation  11100.7     1220.7   9.094  < 2e-16 ***
##    GVA_MAIN_Industrial               15348.7     2654.9   5.781 7.82e-09 ***
##    IDHM                              38163.3     2351.1  16.232  < 2e-16 ***
##    COMP_TOT                          46378.4    12894.6   3.597 0.000325 ***
## 
##    ---Significance stars
##    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
##    Residual standard error: 14850 on 5548 degrees of freedom
##    Multiple R-squared: 0.4671
##    Adjusted R-squared: 0.4657 
##    F-statistic: 324.2 on 15 and 5548 DF,  p-value: < 2.2e-16 
##    ***Extra Diagnostic information
##    Residual sum of squares: 1.223844e+12
##    Sigma(hat): 14833.64
##    AIC:  122702.5
##    AICc:  122702.6
##    ***********************************************************************
##    *          Results of Geographically Weighted Regression              *
##    ***********************************************************************
## 
##    *********************Model calibration information*********************
##    Kernel function: gaussian 
##    Fixed bandwidth: 364 
##    Regression points: the same locations as observations are used.
##    Distance metric: Great Circle distance metric is used.
## 
##    ****************Summary of GWR coefficient estimates:******************
##                                           Min.    1st Qu.     Median    3rd Qu.
##    Intercept                         -92329.16  -40715.76  -13121.18    6452.30
##    POP_WORKING_RATIO                 -50961.49    9286.58   18723.07   58946.25
##    POP_ELDERLY_RATIO                -300167.48  -53694.90  -40170.12  -24763.62
##    GVA_AGROPEC_RATIO                  -1419.48    2350.16    8997.05   20989.29
##    GVA_INDUSTRY_RATIO                 -4759.82   10065.73   29567.73   33829.25
##    GVA_SERVICES_RATIO                -10466.24    1004.91    9619.26   14927.33
##    CAT_INTERMEDIATE_REMOTE            -2499.98    1220.93    5936.90   19238.03
##    CAT_RURAL_REMOTE                   -2115.10     255.87    2455.88    5966.25
##    GVA_MAIN_Public_Sector            -18916.73  -10431.07   -8711.67   -8063.23
##    GVA_MAIN_Commercial               -14577.52   11193.58   16074.03   33082.35
##    GVA_MAIN_Other_services           -32806.76   -7849.04   -7358.40   -5494.94
##    GVA_MAIN_Public_Utilities         -21453.80   11024.04   18877.86   28535.35
##    GVA_MAIN_Industry_transformation  -62782.35    7649.03   13818.96   18882.11
##    GVA_MAIN_Industrial               -12371.02    5276.13   19584.56   23744.54
##    IDHM                                1910.09   15644.84   37478.87   48643.06
##    COMP_TOT                         -601785.36   38835.86   57729.70  102521.38
##                                          Max.
##    Intercept                          26461.8
##    POP_WORKING_RATIO                 111318.4
##    POP_ELDERLY_RATIO                  53442.4
##    GVA_AGROPEC_RATIO                  31812.6
##    GVA_INDUSTRY_RATIO                 45073.8
##    GVA_SERVICES_RATIO                 26536.2
##    CAT_INTERMEDIATE_REMOTE            40003.4
##    CAT_RURAL_REMOTE                   15040.3
##    GVA_MAIN_Public_Sector              1064.7
##    GVA_MAIN_Commercial                56781.7
##    GVA_MAIN_Other_services             4793.0
##    GVA_MAIN_Public_Utilities          38054.1
##    GVA_MAIN_Industry_transformation   27756.8
##    GVA_MAIN_Industrial                34343.3
##    IDHM                              132568.6
##    COMP_TOT                         1189196.5
##    ************************Diagnostic information*************************
##    Number of data points: 5564 
##    Effective number of parameters (2trace(S) - trace(S'S)): 215.9051 
##    Effective degrees of freedom (n-2trace(S) + trace(S'S)): 5348.095 
##    AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 122049.3 
##    AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 121879.2 
##    Residual sum of squares: 1.032151e+12 
##    R-square value:  0.5506 
##    Adjusted R-square value:  0.5324541 
## 
##    ***********************************************************************
##    Program stops at: 2020-06-01 00:21:59

6.2 Interpretation of Results

By using the maximum bandwidth established earlier, we can see that the R-square value has gone up slightly which means that using geographical weighted method has resulted in a better model overall. However, we need to check the geographic R-square distribution below.

7 Visualising GWR Output

7.1 Converting SDF into sf data.frame

GWR.sf <- st_as_sf(gwr.fixed$SDF) %>%
  st_transform(4674)

GWR.sf.transformed <- st_transform(GWR.sf, 4674)

gwr.fixed.output <- as.data.frame(gwr.fixed$SDF)

Brazil_sig_Indic.sf.fixed <- cbind(Brazil_sig_Indic.sf, as.matrix(gwr.fixed.output))

range(Brazil_sig_Indic.sf.fixed$Local_R2)
## [1] 0.4524265 0.9703265
summary(gwr.fixed$SDF$yhat)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -15511    8943   17909   21388   29558  104354

7.2 Visualising local R2

tm_shape(Brazil_sig_Indic.sf.fixed) +  
  tm_fill(col = "Local_R2",
          style = "jenks",
           palette = "Greens",
          title = "R-squared Values")

7.2.1 Interpretation

As we can see, there does not seem to be any pattern in distribution. Although the model does seem to explain some area better than others, it is not clear why this is the case.