https://rpubs.com/Huiling/take-home_ex04

I will be publishing as static maps.

1 Overview

In this take-home exercise, you are tasked to segment Singapore at the planning subzone level into homogeneous socioeconomic areas by combining geodemographic data extracted from Singapore Department of Statistics and urban functions extracted from the geospatial data provided.

1.1 Data

To provide answers to the questions above, the data sets used are:

2 Loading of packages

packages = c('olsrr', 'corrplot', 'ggpubr', 'sf', 'spdep', 'GWmodel', 'tmap', 'tidyverse', 'geobr', 'dplyr', 'rgdal')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
    }
  library(p,character.only = T)
}

3 Data Importing and Preparation

3.1 Importing geospatial data into R environment

brazil_cities <- read_csv("data/aspatial/BRAZIL_CITIES.csv")
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   CITY = col_character(),
##   STATE = col_character(),
##   REGIAO_TUR = col_character(),
##   CATEGORIA_TUR = col_character(),
##   RURAL_URBAN = col_character(),
##   GVA_MAIN = col_character()
## )
## See spec(...) for full column specifications.

3.2 Filter required variables

I have selected the following variables from the table as listed below:

Dependent Variable:

  • Gross Domestic Product per capita (GDP_CAPITA) will be used identify the GDP per capita for each municipality.

Independent Variables:

  • Name of the City (CITY) and Name of the State (STATE) will be used to identify each municipality. The reason for choosing 2 variables instead of one is because both of the variables have duplicates. I will be merging both of the variables into one for visualisation.
  • If Capital (CAPITAL) will be used to identify whether the municipality is a capital or not. The reason for this is because usually capital of the state will have a higher GDP.
  • Resident Population (IBGE_RES_POP), Resident Population Brazilian (IBGE_RES_POP_BRAS), Redident Population Foreigners (IBGE_RES_POP_ESTR), Resident Population from 15 to 59 y.o (IBGE_15) and City area (AREA) will be used to determine the population density as well as the distribution of brazilian, foreigners and working population in each municipality
  • Domestic Units Total (IBGE_DU), Domestic Units Urban (IBGE_DU_URBAN) and Domestic Units Rural (IBGE_DU_RURAL) will be used to get the distribution of urban and rural in each municipality.
  • Planted Area (IBGE_PLANTED_AREA) and Crop Production (IBGE_CROP_PRODUCTION_$) will be used to get the crop production per hectare for each municipality.
  • HDI Education index (IDHM_Educacao) will be used to identify the education level for each municipality. The reason for this is because those municipality with higher Education Index would be those with higher GDP.
  • City Longitude (LONG) and City Latitude (LAT) will be used to identify the exact location of the municipality which would allow the plotting of the geometry.
  • Tourism Category Region (REGIAO_TUR) will be used to identify whether the municipality has a tourist landmark. The reason for this is because a municipality should have a higher GDP if it has a tourist landmark.
  • Gross Added Value - Agropecuary (GVA_AGROPEC), Gross Added Value - Industry (GVA_INDUSTRY), Gross Added Value - Services (GVA_SERVICES), Gross Added Value - Public Services (GVA_PUBLIC), Number of Companies: Agriculture (COMP_A), Number of Companies: Extractive industries (COMP_B), Number of Companies: Industries of transformation (COMP_C), Number of Companies: Electricity and gas (COMP_D), Number of Companies: Water (COMP_E), Number of Companies: Construction (COMP_F), Number of Companies: Trade; repair of motor vehicles and motorcycles (COMP_G), Number of Companies: Transport (COMP_H), Number of Companies: Accommodation and food (COMP_I), Number of Companies: Information and communication (COMP_J), Number of Companies: Financial (COMP_K), Number of Companies: Real estate activities (COMP_L), Number of Companies: Professional (COMP_M), Number of Companies: Administrative activities and complementary services (COMP_N), Number of Companies: Public administration (COMP_O), Number of Companies: Education (COMP_P), Number of Companies: Human health and social services (COMP_Q), Number of Companies: Arts (COMP_R), Number of Companies: Other service activities (COMP_S), Number of Companies: Domestic services (COMP_T), Number of Companies: International and other extraterritorial institutions (COMP_U), Total number of hotels (HOTELS), Total number of private bank agencies (Pr_Agencies) and Total number of public bank agencies (Pu_Agencies) will be used as it contributes to the overall GDP per capita
brazil_cities_summarised <- brazil_cities %>%
                            select(GDP_CAPITA, CITY, STATE, CAPITAL, IBGE_RES_POP, IBGE_RES_POP_BRAS, IBGE_RES_POP_ESTR, `IBGE_15-59`, AREA, IBGE_DU, IBGE_DU_URBAN, IBGE_DU_RURAL, IBGE_PLANTED_AREA, `IBGE_CROP_PRODUCTION_$`, IDHM_Educacao, LONG, LAT, REGIAO_TUR, GVA_AGROPEC, GVA_AGROPEC, GVA_INDUSTRY, GVA_SERVICES, GVA_PUBLIC, COMP_A, COMP_B, COMP_C, COMP_D, COMP_E, COMP_F, COMP_G, COMP_H, COMP_I, COMP_J, COMP_K, COMP_L, COMP_M, COMP_N, COMP_O, COMP_P, COMP_Q, COMP_R, COMP_S, COMP_T, COMP_U, HOTELS, Pr_Agencies, Pu_Agencies)

Now, I will check for variables that have empty values.

summary(brazil_cities_summarised)
##    GDP_CAPITA         CITY              STATE              CAPITAL        
##  Min.   :  3191   Length:5573        Length:5573        Min.   :0.000000  
##  1st Qu.:  9103   Class :character   Class :character   1st Qu.:0.000000  
##  Median : 16129   Mode  :character   Mode  :character   Median :0.000000  
##  Mean   : 21306                                         Mean   :0.004845  
##  3rd Qu.: 26152                                         3rd Qu.:0.000000  
##  Max.   :314638                                         Max.   :1.000000  
##  NA's   :1476                                                             
##   IBGE_RES_POP      IBGE_RES_POP_BRAS  IBGE_RES_POP_ESTR    IBGE_15-59     
##  Min.   :     805   Min.   :     805   Min.   :     0.0   Min.   :     94  
##  1st Qu.:    5235   1st Qu.:    5230   1st Qu.:     0.0   1st Qu.:   1734  
##  Median :   10934   Median :   10926   Median :     0.0   Median :   3841  
##  Mean   :   34278   Mean   :   34200   Mean   :    77.5   Mean   :  18212  
##  3rd Qu.:   23424   3rd Qu.:   23390   3rd Qu.:    10.0   3rd Qu.:   9628  
##  Max.   :11253503   Max.   :11133776   Max.   :119727.0   Max.   :7058221  
##  NA's   :8          NA's   :8          NA's   :8          NA's   :8        
##       AREA          IBGE_DU        IBGE_DU_URBAN     IBGE_DU_RURAL  
##  Min.   :  1.0   Min.   :    239   Min.   :     60   Min.   :    3  
##  1st Qu.: 25.0   1st Qu.:   1572   1st Qu.:    874   1st Qu.:  487  
##  Median :201.8   Median :   3174   Median :   1846   Median :  931  
##  Mean   :266.1   Mean   :  10303   Mean   :   8859   Mean   : 1463  
##  3rd Qu.:410.9   3rd Qu.:   6726   3rd Qu.:   4624   3rd Qu.: 1832  
##  Max.   :999.5   Max.   :3576148   Max.   :3548433   Max.   :33809  
##  NA's   :3       NA's   :10        NA's   :10        NA's   :81     
##  IBGE_PLANTED_AREA   IBGE_CROP_PRODUCTION_$ IDHM_Educacao         LONG       
##  Min.   :      0.0   Min.   :      0        Min.   :0.2070   Min.   :-72.92  
##  1st Qu.:    910.2   1st Qu.:   2326        1st Qu.:0.4900   1st Qu.:-50.87  
##  Median :   3471.5   Median :  13846        Median :0.5600   Median :-46.52  
##  Mean   :  14179.9   Mean   :  57384        Mean   :0.5591   Mean   :-46.23  
##  3rd Qu.:  11194.2   3rd Qu.:  55619        3rd Qu.:0.6310   3rd Qu.:-41.40  
##  Max.   :1205669.0   Max.   :3274885        Max.   :0.8250   Max.   :-32.44  
##  NA's   :3           NA's   :3              NA's   :8        NA's   :9       
##       LAT           REGIAO_TUR         GVA_AGROPEC      GVA_INDUSTRY     
##  Min.   :-33.688   Length:5573        Min.   :     0   Min.   :       1  
##  1st Qu.:-22.838   Class :character   1st Qu.:  3224   1st Qu.:    1684  
##  Median :-18.089   Mode  :character   Median : 15941   Median :    6100  
##  Mean   :-16.444                      Mean   : 31281   Mean   :  150813  
##  3rd Qu.: -8.489                      3rd Qu.: 39534   3rd Qu.:   35684  
##  Max.   :  4.585                      Max.   :655505   Max.   :15043915  
##  NA's   :9                            NA's   :1476     NA's   :1476      
##   GVA_SERVICES        GVA_PUBLIC           COMP_A            COMP_B       
##  Min.   :       2   Min.   :       7   Min.   :   0.00   Min.   :  0.000  
##  1st Qu.:    9426   1st Qu.:   15970   1st Qu.:   3.00   1st Qu.:  0.000  
##  Median :   26696   Median :   29879   Median :   7.00   Median :  1.000  
##  Mean   :  367181   Mean   :  106612   Mean   :  36.14   Mean   :  3.153  
##  3rd Qu.:   98873   3rd Qu.:   66222   3rd Qu.:  22.00   3rd Qu.:  4.000  
##  Max.   :53213122   Max.   :10664797   Max.   :1948.00   Max.   :139.000  
##  NA's   :1476       NA's   :1476       NA's   :4157      NA's   :4157     
##      COMP_C           COMP_D            COMP_E            COMP_F       
##  Min.   :   0.0   Min.   :  0.000   Min.   :  0.000   Min.   :   0.00  
##  1st Qu.:  25.0   1st Qu.:  0.000   1st Qu.:  0.000   1st Qu.:   8.00  
##  Median :  58.0   Median :  0.000   Median :  1.000   Median :  20.50  
##  Mean   : 173.4   Mean   :  0.775   Mean   :  4.525   Mean   :  95.45  
##  3rd Qu.: 151.0   3rd Qu.:  0.000   3rd Qu.:  4.000   3rd Qu.:  61.00  
##  Max.   :6025.0   Max.   :143.000   Max.   :163.000   Max.   :6373.00  
##  NA's   :4157     NA's   :4157      NA's   :4157      NA's   :4157     
##      COMP_G            COMP_H            COMP_I           COMP_J       
##  Min.   :    4.0   Min.   :   0.00   Min.   :   0.0   Min.   :   0.00  
##  1st Qu.:  101.0   1st Qu.:  15.00   1st Qu.:  14.0   1st Qu.:   2.00  
##  Median :  228.0   Median :  34.00   Median :  32.0   Median :   6.00  
##  Mean   :  699.5   Mean   :  89.51   Mean   : 122.5   Mean   :  44.79  
##  3rd Qu.:  575.8   3rd Qu.:  85.00   3rd Qu.:  93.0   3rd Qu.:  21.00  
##  Max.   :33566.0   Max.   :3873.00   Max.   :6514.0   Max.   :4535.00  
##  NA's   :4157      NA's   :4157      NA's   :4157     NA's   :4157     
##      COMP_K            COMP_L            COMP_M            COMP_N       
##  Min.   :   0.00   Min.   :   0.00   Min.   :    0.0   Min.   :    0.0  
##  1st Qu.:   1.00   1st Qu.:   1.00   1st Qu.:    7.0   1st Qu.:    8.0  
##  Median :   4.00   Median :   4.50   Median :   17.0   Median :   20.0  
##  Mean   :  29.49   Mean   :  33.55   Mean   :  104.2   Mean   :  179.4  
##  3rd Qu.:  12.00   3rd Qu.:  18.00   3rd Qu.:   50.0   3rd Qu.:   73.0  
##  Max.   :3501.00   Max.   :2785.00   Max.   :11925.0   Max.   :17752.0  
##  NA's   :4157      NA's   :4157      NA's   :4157      NA's   :4157     
##      COMP_O            COMP_P            COMP_Q            COMP_R       
##  Min.   :  1.000   Min.   :   0.00   Min.   :   0.00   Min.   :   0.00  
##  1st Qu.:  2.000   1st Qu.:   6.00   1st Qu.:   5.00   1st Qu.:   3.00  
##  Median :  3.000   Median :  16.00   Median :  15.00   Median :   7.00  
##  Mean   :  3.917   Mean   :  58.87   Mean   :  68.42   Mean   :  25.13  
##  3rd Qu.:  4.000   3rd Qu.:  41.25   3rd Qu.:  42.00   3rd Qu.:  20.00  
##  Max.   :120.000   Max.   :3325.00   Max.   :4642.00   Max.   :1436.00  
##  NA's   :4157      NA's   :4157      NA's   :4157      NA's   :4157     
##      COMP_S            COMP_T         COMP_U          HOTELS      
##  Min.   :   0.00   Min.   :0      Min.   :0.000   Min.   : 1.000  
##  1st Qu.:  13.00   1st Qu.:0      1st Qu.:0.000   1st Qu.: 1.000  
##  Median :  31.00   Median :0      Median :0.000   Median : 1.000  
##  Mean   :  95.61   Mean   :0      Mean   :0.026   Mean   : 3.546  
##  3rd Qu.:  68.00   3rd Qu.:0      3rd Qu.:0.000   3rd Qu.: 3.000  
##  Max.   :5327.00   Max.   :0      Max.   :8.000   Max.   :46.000  
##  NA's   :4157      NA's   :4157   NA's   :4157    NA's   :5236    
##   Pr_Agencies       Pu_Agencies    
##  Min.   :  0.000   Min.   :  0.00  
##  1st Qu.:  1.000   1st Qu.:  1.00  
##  Median :  2.000   Median :  2.00  
##  Mean   :  4.425   Mean   :  3.45  
##  3rd Qu.:  3.000   3rd Qu.:  3.00  
##  Max.   :273.000   Max.   :168.00  
##  NA's   :4299      NA's   :4299

From the summary above, it can be seen that GDP_CAPITA, IBGE_RES_POP, IBGE_RES_POP_BRAS, IBGE_RES_POP_ESTR, IBGE_15-19, AREA, IBGE_DU, IBGE_DU_URBAN, IBGE_DU_RURAL, IBGE_PLANTED_AREA, IBGE_CROP_PRODUCTION_$, IDHM_Educacao, LONG, LAT, GVA_AGROPEC, GVA_INDUSTRY, GVA_SERVICES, GVA_PUBLIC, COMP_A, COMP_B, COMP_C, COMP_D, COMP_E, COMP_F, COMP_G, COMP_H, COMP_I, COMP_J, COMP_K, COMP_L, COMP_M, COMP_N, COMP_O, COMP_P, COMP_Q, COMP_R, COMP_S, COMP_T, COMP_U, HOTELS, Pr_Agencies, Pu_Agencies have multiple NA’s. Hence, there is a need to substitute to 0 or the actual value.

3.2.1 Replacement of NA values

For the variables LONG and LAT, I will be checking https://en.db-city.com/ to reference the LONG and LAT since those cannot be changed to 0 or left as NA. I will be changing it within the excel file. I have duplicates the excel file, and replaced the LONG and LAT variables that are missing.

brazil_cities <- read_csv("data/aspatial/BRAZIL_CITIES_2.csv")
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   CITY = col_character(),
##   STATE = col_character(),
##   REGIAO_TUR = col_character(),
##   CATEGORIA_TUR = col_character(),
##   RURAL_URBAN = col_character(),
##   GVA_MAIN = col_character()
## )
## See spec(...) for full column specifications.
brazil_cities_summarised <- brazil_cities %>%
                            select(GDP_CAPITA, CITY, STATE, CAPITAL, IBGE_RES_POP, IBGE_RES_POP_BRAS, IBGE_RES_POP_ESTR, `IBGE_15-59`, AREA, IBGE_DU, IBGE_DU_URBAN, IBGE_DU_RURAL, IBGE_PLANTED_AREA, `IBGE_CROP_PRODUCTION_$`, IDHM_Educacao, LONG, LAT, REGIAO_TUR, GVA_AGROPEC, GVA_AGROPEC, GVA_INDUSTRY, GVA_SERVICES, GVA_PUBLIC, COMP_A, COMP_B, COMP_C, COMP_D, COMP_E, COMP_F, COMP_G, COMP_H, COMP_I, COMP_J, COMP_K, COMP_L, COMP_M, COMP_N, COMP_O, COMP_P, COMP_Q, COMP_R, COMP_S, COMP_T, COMP_U, HOTELS, Pr_Agencies, Pu_Agencies)

Now we will check whether there are any empty values in the LONG and LAT variables.

summary(brazil_cities_summarised)
##    GDP_CAPITA         CITY              STATE              CAPITAL        
##  Min.   :  3191   Length:5573        Length:5573        Min.   :0.000000  
##  1st Qu.:  9103   Class :character   Class :character   1st Qu.:0.000000  
##  Median : 16129   Mode  :character   Mode  :character   Median :0.000000  
##  Mean   : 21306                                         Mean   :0.004845  
##  3rd Qu.: 26152                                         3rd Qu.:0.000000  
##  Max.   :314638                                         Max.   :1.000000  
##  NA's   :1476                                                             
##   IBGE_RES_POP      IBGE_RES_POP_BRAS  IBGE_RES_POP_ESTR    IBGE_15-59     
##  Min.   :     805   Min.   :     805   Min.   :     0.0   Min.   :     94  
##  1st Qu.:    5235   1st Qu.:    5230   1st Qu.:     0.0   1st Qu.:   1734  
##  Median :   10934   Median :   10926   Median :     0.0   Median :   3841  
##  Mean   :   34278   Mean   :   34200   Mean   :    77.5   Mean   :  18212  
##  3rd Qu.:   23424   3rd Qu.:   23390   3rd Qu.:    10.0   3rd Qu.:   9628  
##  Max.   :11253503   Max.   :11133776   Max.   :119727.0   Max.   :7058221  
##  NA's   :8          NA's   :8          NA's   :8          NA's   :8        
##       AREA          IBGE_DU        IBGE_DU_URBAN     IBGE_DU_RURAL  
##  Min.   :  1.0   Min.   :    239   Min.   :     60   Min.   :    3  
##  1st Qu.: 25.0   1st Qu.:   1572   1st Qu.:    874   1st Qu.:  487  
##  Median :201.8   Median :   3174   Median :   1846   Median :  931  
##  Mean   :266.1   Mean   :  10303   Mean   :   8859   Mean   : 1463  
##  3rd Qu.:410.9   3rd Qu.:   6726   3rd Qu.:   4624   3rd Qu.: 1832  
##  Max.   :999.5   Max.   :3576148   Max.   :3548433   Max.   :33809  
##  NA's   :3       NA's   :10        NA's   :10        NA's   :81     
##  IBGE_PLANTED_AREA   IBGE_CROP_PRODUCTION_$ IDHM_Educacao         LONG       
##  Min.   :      0.0   Min.   :      0        Min.   :0.2070   Min.   :-72.92  
##  1st Qu.:    910.2   1st Qu.:   2326        1st Qu.:0.4900   1st Qu.:-50.87  
##  Median :   3471.5   Median :  13846        Median :0.5600   Median :-46.52  
##  Mean   :  14179.9   Mean   :  57384        Mean   :0.5591   Mean   :-46.23  
##  3rd Qu.:  11194.2   3rd Qu.:  55619        3rd Qu.:0.6310   3rd Qu.:-41.41  
##  Max.   :1205669.0   Max.   :3274885        Max.   :0.8250   Max.   :-32.44  
##  NA's   :3           NA's   :3              NA's   :8                        
##       LAT           REGIAO_TUR         GVA_AGROPEC      GVA_INDUSTRY     
##  Min.   :-33.688   Length:5573        Min.   :     0   Min.   :       1  
##  1st Qu.:-22.843   Class :character   1st Qu.:  3224   1st Qu.:    1684  
##  Median :-18.091   Mode  :character   Median : 15941   Median :    6100  
##  Mean   :-16.451                      Mean   : 31281   Mean   :  150813  
##  3rd Qu.: -8.490                      3rd Qu.: 39534   3rd Qu.:   35684  
##  Max.   :  4.585                      Max.   :655505   Max.   :15043915  
##                                       NA's   :1476     NA's   :1476      
##   GVA_SERVICES        GVA_PUBLIC           COMP_A            COMP_B       
##  Min.   :       2   Min.   :       7   Min.   :   0.00   Min.   :  0.000  
##  1st Qu.:    9426   1st Qu.:   15970   1st Qu.:   3.00   1st Qu.:  0.000  
##  Median :   26696   Median :   29879   Median :   7.00   Median :  1.000  
##  Mean   :  367181   Mean   :  106612   Mean   :  36.14   Mean   :  3.153  
##  3rd Qu.:   98873   3rd Qu.:   66222   3rd Qu.:  22.00   3rd Qu.:  4.000  
##  Max.   :53213122   Max.   :10664797   Max.   :1948.00   Max.   :139.000  
##  NA's   :1476       NA's   :1476       NA's   :4157      NA's   :4157     
##      COMP_C           COMP_D            COMP_E            COMP_F       
##  Min.   :   0.0   Min.   :  0.000   Min.   :  0.000   Min.   :   0.00  
##  1st Qu.:  25.0   1st Qu.:  0.000   1st Qu.:  0.000   1st Qu.:   8.00  
##  Median :  58.0   Median :  0.000   Median :  1.000   Median :  20.50  
##  Mean   : 173.4   Mean   :  0.775   Mean   :  4.525   Mean   :  95.45  
##  3rd Qu.: 151.0   3rd Qu.:  0.000   3rd Qu.:  4.000   3rd Qu.:  61.00  
##  Max.   :6025.0   Max.   :143.000   Max.   :163.000   Max.   :6373.00  
##  NA's   :4157     NA's   :4157      NA's   :4157      NA's   :4157     
##      COMP_G            COMP_H            COMP_I           COMP_J       
##  Min.   :    4.0   Min.   :   0.00   Min.   :   0.0   Min.   :   0.00  
##  1st Qu.:  101.0   1st Qu.:  15.00   1st Qu.:  14.0   1st Qu.:   2.00  
##  Median :  228.0   Median :  34.00   Median :  32.0   Median :   6.00  
##  Mean   :  699.5   Mean   :  89.51   Mean   : 122.5   Mean   :  44.79  
##  3rd Qu.:  575.8   3rd Qu.:  85.00   3rd Qu.:  93.0   3rd Qu.:  21.00  
##  Max.   :33566.0   Max.   :3873.00   Max.   :6514.0   Max.   :4535.00  
##  NA's   :4157      NA's   :4157      NA's   :4157     NA's   :4157     
##      COMP_K            COMP_L            COMP_M            COMP_N       
##  Min.   :   0.00   Min.   :   0.00   Min.   :    0.0   Min.   :    0.0  
##  1st Qu.:   1.00   1st Qu.:   1.00   1st Qu.:    7.0   1st Qu.:    8.0  
##  Median :   4.00   Median :   4.50   Median :   17.0   Median :   20.0  
##  Mean   :  29.49   Mean   :  33.55   Mean   :  104.2   Mean   :  179.4  
##  3rd Qu.:  12.00   3rd Qu.:  18.00   3rd Qu.:   50.0   3rd Qu.:   73.0  
##  Max.   :3501.00   Max.   :2785.00   Max.   :11925.0   Max.   :17752.0  
##  NA's   :4157      NA's   :4157      NA's   :4157      NA's   :4157     
##      COMP_O            COMP_P            COMP_Q            COMP_R       
##  Min.   :  1.000   Min.   :   0.00   Min.   :   0.00   Min.   :   0.00  
##  1st Qu.:  2.000   1st Qu.:   6.00   1st Qu.:   5.00   1st Qu.:   3.00  
##  Median :  3.000   Median :  16.00   Median :  15.00   Median :   7.00  
##  Mean   :  3.917   Mean   :  58.87   Mean   :  68.42   Mean   :  25.13  
##  3rd Qu.:  4.000   3rd Qu.:  41.25   3rd Qu.:  42.00   3rd Qu.:  20.00  
##  Max.   :120.000   Max.   :3325.00   Max.   :4642.00   Max.   :1436.00  
##  NA's   :4157      NA's   :4157      NA's   :4157      NA's   :4157     
##      COMP_S            COMP_T         COMP_U          HOTELS      
##  Min.   :   0.00   Min.   :0      Min.   :0.000   Min.   : 1.000  
##  1st Qu.:  13.00   1st Qu.:0      1st Qu.:0.000   1st Qu.: 1.000  
##  Median :  31.00   Median :0      Median :0.000   Median : 1.000  
##  Mean   :  95.61   Mean   :0      Mean   :0.026   Mean   : 3.546  
##  3rd Qu.:  68.00   3rd Qu.:0      3rd Qu.:0.000   3rd Qu.: 3.000  
##  Max.   :5327.00   Max.   :0      Max.   :8.000   Max.   :46.000  
##  NA's   :4157      NA's   :4157   NA's   :4157    NA's   :5236    
##   Pr_Agencies       Pu_Agencies    
##  Min.   :  0.000   Min.   :  0.00  
##  1st Qu.:  1.000   1st Qu.:  1.00  
##  Median :  2.000   Median :  2.00  
##  Mean   :  4.425   Mean   :  3.45  
##  3rd Qu.:  3.000   3rd Qu.:  3.00  
##  Max.   :273.000   Max.   :168.00  
##  NA's   :4299      NA's   :4299

From the results above, it can be seen that the LAT and LONG variables are filled. Now I will be replacing all the NA variables to 0. I will be using the following code chunk in order to replace all the values for the variables.

brazil_cities_summarised <- brazil_cities_summarised %>%
                            mutate_if(is.numeric, ~replace(., is.na(.), 0))

Now I will check whether there are anymore missing variables.

summary(brazil_cities_summarised)
##    GDP_CAPITA         CITY              STATE              CAPITAL        
##  Min.   :     0   Length:5573        Length:5573        Min.   :0.000000  
##  1st Qu.:     0   Class :character   Class :character   1st Qu.:0.000000  
##  Median : 10474   Mode  :character   Mode  :character   Median :0.000000  
##  Mean   : 15663                                         Mean   :0.004845  
##  3rd Qu.: 21967                                         3rd Qu.:0.000000  
##  Max.   :314638                                         Max.   :1.000000  
##   IBGE_RES_POP      IBGE_RES_POP_BRAS  IBGE_RES_POP_ESTR     IBGE_15-59     
##  Min.   :       0   Min.   :       0   Min.   :     0.00   Min.   :      0  
##  1st Qu.:    5217   1st Qu.:    5217   1st Qu.:     0.00   1st Qu.:   1728  
##  Median :   10927   Median :   10916   Median :     0.00   Median :   3835  
##  Mean   :   34229   Mean   :   34151   Mean   :    77.39   Mean   :  18186  
##  3rd Qu.:   23397   3rd Qu.:   23380   3rd Qu.:    10.00   3rd Qu.:   9591  
##  Max.   :11253503   Max.   :11133776   Max.   :119727.00   Max.   :7058221  
##       AREA          IBGE_DU        IBGE_DU_URBAN     IBGE_DU_RURAL  
##  Min.   :  0.0   Min.   :      0   Min.   :      0   Min.   :    0  
##  1st Qu.: 25.0   1st Qu.:   1566   1st Qu.:    870   1st Qu.:  470  
##  Median :201.5   Median :   3169   Median :   1839   Median :  916  
##  Mean   :265.9   Mean   :  10284   Mean   :   8843   Mean   : 1441  
##  3rd Qu.:410.9   3rd Qu.:   6718   3rd Qu.:   4615   3rd Qu.: 1812  
##  Max.   :999.5   Max.   :3576148   Max.   :3548433   Max.   :33809  
##  IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION_$ IDHM_Educacao         LONG       
##  Min.   :      0   Min.   :      0        Min.   :0.0000   Min.   :-72.92  
##  1st Qu.:    908   1st Qu.:   2322        1st Qu.:0.4900   1st Qu.:-50.87  
##  Median :   3464   Median :  13832        Median :0.5600   Median :-46.52  
##  Mean   :  14172   Mean   :  57353        Mean   :0.5583   Mean   :-46.23  
##  3rd Qu.:  11174   3rd Qu.:  55608        3rd Qu.:0.6310   3rd Qu.:-41.41  
##  Max.   :1205669   Max.   :3274885        Max.   :0.8250   Max.   :-32.44  
##       LAT           REGIAO_TUR         GVA_AGROPEC      GVA_INDUSTRY     
##  Min.   :-33.688   Length:5573        Min.   :     0   Min.   :       0  
##  1st Qu.:-22.843   Class :character   1st Qu.:     0   1st Qu.:       0  
##  Median :-18.091   Mode  :character   Median :  6127   Median :    2434  
##  Mean   :-16.451                      Mean   : 22996   Mean   :  110871  
##  3rd Qu.: -8.490                      3rd Qu.: 28262   3rd Qu.:   16754  
##  Max.   :  4.585                      Max.   :655505   Max.   :15043915  
##   GVA_SERVICES        GVA_PUBLIC           COMP_A             COMP_B        
##  Min.   :       0   Min.   :       0   Min.   :   0.000   Min.   :  0.0000  
##  1st Qu.:       0   1st Qu.:       0   1st Qu.:   0.000   1st Qu.:  0.0000  
##  Median :   12535   Median :   19382   Median :   0.000   Median :  0.0000  
##  Mean   :  269933   Mean   :   78376   Mean   :   9.182   Mean   :  0.8012  
##  3rd Qu.:   56919   3rd Qu.:   48201   3rd Qu.:   0.000   3rd Qu.:  0.0000  
##  Max.   :53213122   Max.   :10664797   Max.   :1948.000   Max.   :139.0000  
##      COMP_C            COMP_D             COMP_E           COMP_F       
##  Min.   :   0.00   Min.   :  0.0000   Min.   :  0.00   Min.   :   0.00  
##  1st Qu.:   0.00   1st Qu.:  0.0000   1st Qu.:  0.00   1st Qu.:   0.00  
##  Median :   0.00   Median :  0.0000   Median :  0.00   Median :   0.00  
##  Mean   :  44.07   Mean   :  0.1968   Mean   :  1.15   Mean   :  24.25  
##  3rd Qu.:   2.00   3rd Qu.:  0.0000   3rd Qu.:  0.00   3rd Qu.:   1.00  
##  Max.   :6025.00   Max.   :143.0000   Max.   :163.00   Max.   :6373.00  
##      COMP_G            COMP_H            COMP_I            COMP_J       
##  Min.   :    0.0   Min.   :   0.00   Min.   :   0.00   Min.   :   0.00  
##  1st Qu.:    0.0   1st Qu.:   0.00   1st Qu.:   0.00   1st Qu.:   0.00  
##  Median :    0.0   Median :   0.00   Median :   0.00   Median :   0.00  
##  Mean   :  177.7   Mean   :  22.74   Mean   :  31.12   Mean   :  11.38  
##  3rd Qu.:   22.0   3rd Qu.:   2.00   3rd Qu.:   2.00   3rd Qu.:   0.00  
##  Max.   :33566.0   Max.   :3873.00   Max.   :6514.00   Max.   :4535.00  
##      COMP_K             COMP_L             COMP_M             COMP_N        
##  Min.   :   0.000   Min.   :   0.000   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:   0.000   1st Qu.:   0.000   1st Qu.:    0.00   1st Qu.:    0.00  
##  Median :   0.000   Median :   0.000   Median :    0.00   Median :    0.00  
##  Mean   :   7.494   Mean   :   8.525   Mean   :   26.47   Mean   :   45.57  
##  3rd Qu.:   0.000   3rd Qu.:   0.000   3rd Qu.:    1.00   3rd Qu.:    1.00  
##  Max.   :3501.000   Max.   :2785.000   Max.   :11925.00   Max.   :17752.00  
##      COMP_O             COMP_P            COMP_Q            COMP_R        
##  Min.   :  0.0000   Min.   :   0.00   Min.   :   0.00   Min.   :   0.000  
##  1st Qu.:  0.0000   1st Qu.:   0.00   1st Qu.:   0.00   1st Qu.:   0.000  
##  Median :  0.0000   Median :   0.00   Median :   0.00   Median :   0.000  
##  Mean   :  0.9952   Mean   :  14.96   Mean   :  17.38   Mean   :   6.385  
##  3rd Qu.:  1.0000   3rd Qu.:   1.00   3rd Qu.:   0.00   3rd Qu.:   0.000  
##  Max.   :120.0000   Max.   :3325.00   Max.   :4642.00   Max.   :1436.000  
##      COMP_S            COMP_T      COMP_U             HOTELS       
##  Min.   :   0.00   Min.   :0   Min.   :0.000000   Min.   : 0.0000  
##  1st Qu.:   0.00   1st Qu.:0   1st Qu.:0.000000   1st Qu.: 0.0000  
##  Median :   0.00   Median :0   Median :0.000000   Median : 0.0000  
##  Mean   :  24.29   Mean   :0   Mean   :0.006639   Mean   : 0.2144  
##  3rd Qu.:   2.00   3rd Qu.:0   3rd Qu.:0.000000   3rd Qu.: 0.0000  
##  Max.   :5327.00   Max.   :0   Max.   :8.000000   Max.   :46.0000  
##   Pr_Agencies       Pu_Agencies      
##  Min.   :  0.000   Min.   :  0.0000  
##  1st Qu.:  0.000   1st Qu.:  0.0000  
##  Median :  0.000   Median :  0.0000  
##  Mean   :  1.012   Mean   :  0.7886  
##  3rd Qu.:  0.000   3rd Qu.:  0.0000  
##  Max.   :273.000   Max.   :168.0000

3.2.2 Summarise the data

Now, I will be formatting the brazil_cities_summarised data in order to suit business needs based on what I have listed above. From the result above, it can be seen that COMP_T is totally 0, hence I will be removing it while summarising.

brazil_cities_summ <- brazil_cities_summarised %>%
                      mutate(`City_State` = paste(CITY, STATE, sep=" - ")) %>%
                      mutate(`Brazilian_Percentage` = IBGE_RES_POP_BRAS/IBGE_RES_POP) %>%
                      mutate(`Foreign_Percentage` = IBGE_RES_POP_ESTR/IBGE_RES_POP) %>%
                      mutate(`Working_Percentage` = `IBGE_15-59`/IBGE_RES_POP) %>%
                      mutate(`Urban_Percentage` = IBGE_DU_URBAN/IBGE_DU) %>%
                      mutate(`Rural_Percentage` = IBGE_DU_RURAL/IBGE_DU) %>%
                      mutate(`Production_Area` = `IBGE_CROP_PRODUCTION_$`/IBGE_PLANTED_AREA) %>%
                      mutate(`Tourism_Area` = ifelse(is.na(brazil_cities_summarised$REGIAO_TUR), 0, 1)) %>%
                      select(City_State, LONG, LAT, GDP_CAPITA, Brazilian_Percentage, Foreign_Percentage, Working_Percentage, Urban_Percentage, Rural_Percentage, Production_Area, Tourism_Area, IDHM_Educacao, GVA_AGROPEC, GVA_INDUSTRY, GVA_SERVICES, GVA_PUBLIC, COMP_A, COMP_B, COMP_C, COMP_D, COMP_E, COMP_F, COMP_G, COMP_H, COMP_I, COMP_J, COMP_K, COMP_L, COMP_M, COMP_N, COMP_O, COMP_P, COMP_Q, COMP_R, COMP_S, COMP_U, HOTELS, Pr_Agencies, Pu_Agencies)

Now I will check whether there are anymore missing variables.

summary(brazil_cities_summ)
##   City_State             LONG             LAT            GDP_CAPITA    
##  Length:5573        Min.   :-72.92   Min.   :-33.688   Min.   :     0  
##  Class :character   1st Qu.:-50.87   1st Qu.:-22.843   1st Qu.:     0  
##  Mode  :character   Median :-46.52   Median :-18.091   Median : 10474  
##                     Mean   :-46.23   Mean   :-16.451   Mean   : 15663  
##                     3rd Qu.:-41.41   3rd Qu.: -8.490   3rd Qu.: 21967  
##                     Max.   :-32.44   Max.   :  4.585   Max.   :314638  
##                                                                        
##  Brazilian_Percentage Foreign_Percentage Working_Percentage Urban_Percentage 
##  Min.   :0.6228       Min.   :0.000000   Min.   :0.02558    Min.   :0.04553  
##  1st Qu.:0.9993       1st Qu.:0.000000   1st Qu.:0.28244    1st Qu.:0.49157  
##  Median :1.0000       Median :0.000000   Median :0.39596    Median :0.66277  
##  Mean   :0.9992       Mean   :0.000759   Mean   :0.39751    Mean   :0.65212  
##  3rd Qu.:1.0000       3rd Qu.:0.000699   3rd Qu.:0.51724    3rd Qu.:0.83043  
##  Max.   :1.0000       Max.   :0.377218   Max.   :0.70989    Max.   :1.00000  
##  NA's   :8            NA's   :8          NA's   :8          NA's   :10       
##  Rural_Percentage Production_Area    Tourism_Area    IDHM_Educacao   
##  Min.   :0.0000   Min.   :  0.000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.1696   1st Qu.:  2.507   1st Qu.:0.0000   1st Qu.:0.4900  
##  Median :0.3372   Median :  3.906   Median :0.0000   Median :0.5600  
##  Mean   :0.3479   Mean   :  5.393   Mean   :0.4384   Mean   :0.5583  
##  3rd Qu.:0.5084   3rd Qu.:  6.425   3rd Qu.:1.0000   3rd Qu.:0.6310  
##  Max.   :0.9545   Max.   :106.960   Max.   :1.0000   Max.   :0.8250  
##  NA's   :10       NA's   :70                                         
##   GVA_AGROPEC      GVA_INDUSTRY       GVA_SERVICES        GVA_PUBLIC      
##  Min.   :     0   Min.   :       0   Min.   :       0   Min.   :       0  
##  1st Qu.:     0   1st Qu.:       0   1st Qu.:       0   1st Qu.:       0  
##  Median :  6127   Median :    2434   Median :   12535   Median :   19382  
##  Mean   : 22996   Mean   :  110871   Mean   :  269933   Mean   :   78376  
##  3rd Qu.: 28262   3rd Qu.:   16754   3rd Qu.:   56919   3rd Qu.:   48201  
##  Max.   :655505   Max.   :15043915   Max.   :53213122   Max.   :10664797  
##                                                                           
##      COMP_A             COMP_B             COMP_C            COMP_D        
##  Min.   :   0.000   Min.   :  0.0000   Min.   :   0.00   Min.   :  0.0000  
##  1st Qu.:   0.000   1st Qu.:  0.0000   1st Qu.:   0.00   1st Qu.:  0.0000  
##  Median :   0.000   Median :  0.0000   Median :   0.00   Median :  0.0000  
##  Mean   :   9.182   Mean   :  0.8012   Mean   :  44.07   Mean   :  0.1968  
##  3rd Qu.:   0.000   3rd Qu.:  0.0000   3rd Qu.:   2.00   3rd Qu.:  0.0000  
##  Max.   :1948.000   Max.   :139.0000   Max.   :6025.00   Max.   :143.0000  
##                                                                            
##      COMP_E           COMP_F            COMP_G            COMP_H       
##  Min.   :  0.00   Min.   :   0.00   Min.   :    0.0   Min.   :   0.00  
##  1st Qu.:  0.00   1st Qu.:   0.00   1st Qu.:    0.0   1st Qu.:   0.00  
##  Median :  0.00   Median :   0.00   Median :    0.0   Median :   0.00  
##  Mean   :  1.15   Mean   :  24.25   Mean   :  177.7   Mean   :  22.74  
##  3rd Qu.:  0.00   3rd Qu.:   1.00   3rd Qu.:   22.0   3rd Qu.:   2.00  
##  Max.   :163.00   Max.   :6373.00   Max.   :33566.0   Max.   :3873.00  
##                                                                        
##      COMP_I            COMP_J            COMP_K             COMP_L        
##  Min.   :   0.00   Min.   :   0.00   Min.   :   0.000   Min.   :   0.000  
##  1st Qu.:   0.00   1st Qu.:   0.00   1st Qu.:   0.000   1st Qu.:   0.000  
##  Median :   0.00   Median :   0.00   Median :   0.000   Median :   0.000  
##  Mean   :  31.12   Mean   :  11.38   Mean   :   7.494   Mean   :   8.525  
##  3rd Qu.:   2.00   3rd Qu.:   0.00   3rd Qu.:   0.000   3rd Qu.:   0.000  
##  Max.   :6514.00   Max.   :4535.00   Max.   :3501.000   Max.   :2785.000  
##                                                                           
##      COMP_M             COMP_N             COMP_O             COMP_P       
##  Min.   :    0.00   Min.   :    0.00   Min.   :  0.0000   Min.   :   0.00  
##  1st Qu.:    0.00   1st Qu.:    0.00   1st Qu.:  0.0000   1st Qu.:   0.00  
##  Median :    0.00   Median :    0.00   Median :  0.0000   Median :   0.00  
##  Mean   :   26.47   Mean   :   45.57   Mean   :  0.9952   Mean   :  14.96  
##  3rd Qu.:    1.00   3rd Qu.:    1.00   3rd Qu.:  1.0000   3rd Qu.:   1.00  
##  Max.   :11925.00   Max.   :17752.00   Max.   :120.0000   Max.   :3325.00  
##                                                                            
##      COMP_Q            COMP_R             COMP_S            COMP_U        
##  Min.   :   0.00   Min.   :   0.000   Min.   :   0.00   Min.   :0.000000  
##  1st Qu.:   0.00   1st Qu.:   0.000   1st Qu.:   0.00   1st Qu.:0.000000  
##  Median :   0.00   Median :   0.000   Median :   0.00   Median :0.000000  
##  Mean   :  17.38   Mean   :   6.385   Mean   :  24.29   Mean   :0.006639  
##  3rd Qu.:   0.00   3rd Qu.:   0.000   3rd Qu.:   2.00   3rd Qu.:0.000000  
##  Max.   :4642.00   Max.   :1436.000   Max.   :5327.00   Max.   :8.000000  
##                                                                           
##      HOTELS         Pr_Agencies       Pu_Agencies      
##  Min.   : 0.0000   Min.   :  0.000   Min.   :  0.0000  
##  1st Qu.: 0.0000   1st Qu.:  0.000   1st Qu.:  0.0000  
##  Median : 0.0000   Median :  0.000   Median :  0.0000  
##  Mean   : 0.2144   Mean   :  1.012   Mean   :  0.7886  
##  3rd Qu.: 0.0000   3rd Qu.:  0.000   3rd Qu.:  0.0000  
##  Max.   :46.0000   Max.   :273.000   Max.   :168.0000  
## 

From the summary above, there are values that are NA. Hence, I will be using the following code chunk in order to replace all the values for the variables.

brazil_cities_summ <- brazil_cities_summ %>%
                      mutate_if(is.numeric, ~replace(., is.na(.), 0))

4 Aspatial Data Wrangling

4.1 Converting aspatial data frame into a sf object

Currently, the brazil_cities_summ data frame is aspatial. We will convert it to a sf object. The code chunk below converts brazil_cities_summ data frame into a simple feature data frame by using st_as_sf() of sf packages.

brazil_cities_summ.sf <- st_as_sf(brazil_cities_summ,
                                  coords = c("LONG", "LAT"),
                                  crs= 4676) %>%
                         st_transform(crs=5641)

st_crs(brazil_cities_summ.sf)
## Coordinate Reference System:
##   User input: EPSG:5641 
##   wkt:
## PROJCS["SIRGAS 2000 / Brazil Mercator",
##     GEOGCS["SIRGAS 2000",
##         DATUM["Sistema_de_Referencia_Geocentrico_para_las_AmericaS_2000",
##             SPHEROID["GRS 1980",6378137,298.257222101,
##                 AUTHORITY["EPSG","7019"]],
##             TOWGS84[0,0,0,0,0,0,0],
##             AUTHORITY["EPSG","6674"]],
##         PRIMEM["Greenwich",0,
##             AUTHORITY["EPSG","8901"]],
##         UNIT["degree",0.0174532925199433,
##             AUTHORITY["EPSG","9122"]],
##         AUTHORITY["EPSG","4674"]],
##     PROJECTION["Mercator_2SP"],
##     PARAMETER["standard_parallel_1",-2],
##     PARAMETER["central_meridian",-43],
##     PARAMETER["false_easting",5000000],
##     PARAMETER["false_northing",10000000],
##     UNIT["metre",1,
##         AUTHORITY["EPSG","9001"]],
##     AXIS["X",EAST],
##     AXIS["Y",NORTH],
##     AUTHORITY["EPSG","5641"]]

5 Geospatial Data Wrangling

5.1 Importing geospatial data

The geospatial data used in this will be gotten through read_municipality. Polygon features are used to represent these geographic boundaries. The GIS data is in SIRGAS 2000 projected coordinates systems.

The code chunk below is used to import Brazil’s geospatial data by using by using read_municipality() of geobr packages.

#mun <- read_municipality(code_muni="all", year=2016)
mun <- readOGR(dsn = "data/geospatial", layer = "muni_sf")
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\IS415-Geospatial Analytics and Applications\Take Home Exercise\IS415_Take-home_Ex04\data\geospatial", layer: "muni_sf"
## with 5572 features
## It has 4 fields

5.2 Updating CRS information

The code chunk below updates the newly imported mun with the correct ESPG code (i.e. 5641)

mun_sirgas2000 <- st_as_sf(mun, 5641) %>%
                  st_transform(crs=5641)

After transforming the projection metadata, you can varify the projection of the newly transformed mun_sirgas2000 by using st_crs() of sf package.

The code chunk below will be used to varify the newly transformed mun_sirgas2000

st_crs(mun_sirgas2000)
## Coordinate Reference System:
##   User input: EPSG:5641 
##   wkt:
## PROJCS["SIRGAS 2000 / Brazil Mercator",
##     GEOGCS["SIRGAS 2000",
##         DATUM["Sistema_de_Referencia_Geocentrico_para_las_AmericaS_2000",
##             SPHEROID["GRS 1980",6378137,298.257222101,
##                 AUTHORITY["EPSG","7019"]],
##             TOWGS84[0,0,0,0,0,0,0],
##             AUTHORITY["EPSG","6674"]],
##         PRIMEM["Greenwich",0,
##             AUTHORITY["EPSG","8901"]],
##         UNIT["degree",0.0174532925199433,
##             AUTHORITY["EPSG","9122"]],
##         AUTHORITY["EPSG","4674"]],
##     PROJECTION["Mercator_2SP"],
##     PARAMETER["standard_parallel_1",-2],
##     PARAMETER["central_meridian",-43],
##     PARAMETER["false_easting",5000000],
##     PARAMETER["false_northing",10000000],
##     UNIT["metre",1,
##         AUTHORITY["EPSG","9001"]],
##     AXIS["X",EAST],
##     AXIS["Y",NORTH],
##     AUTHORITY["EPSG","5641"]]

Next, you will reveal the extent of mun_sirgas2000 by using st_bbox() of sf package.

st_bbox(mun_sirgas2000)
##     xmin     ymin     xmax     ymax 
##  1552246  6030702  6575781 10583412

6 Exploratory Data Analysis

6.1 Combining both dataset together

In order to plot a choropleth map, I will be combining both data sets together to present the values.

mun_brazil_city <- st_join(mun_sirgas2000, brazil_cities_summ.sf) 

6.2 Drawing Statistical Polygon Map

Lastly, we want to reveal the geospatial distribution GDP Per Capita in Brazil. The map will be prepared by using tmap package.

Next, the code chunks below is used to create an interactive point symbol map.

tm_shape(mun_brazil_city) +  
  tm_polygons(col = "GDP_CAPITA",
              alpha = 0.6,
              style="quantile",
              popup.vars = c("City_State","GDP_CAPITA"))

From the results, we are able to see that there is a uneven distribution of GDP Per Capita within Brazil.

7 Hedonic Pricing Modelling in R

In this section, I will be building hedonic pricing models for GDP Per Capital using lm() of R base.

7.1 Multiple Linear Regression Method

7.1.1 Visualising the relationships of the independent variables

Before building a multiple regression model, it is important to ensure that the indepdent variables used are not highly correlated to each other. If these highly correlated independent variables are used in building a regression model by mistake, the quality of the model will be compromised. T

Correlation matrix is commonly used to visualise the relationships between the independent variables. Beside the pairs() of R, there are many packages support the display of a correlation matrix. In this section, the corrplot package will be used.

The code chunk below is used to plot a scatterplot matrix of the relationship between the independent variables in brazil_cities_summ data.frame. I will be excluding the varaibles that are higher than 0.8. I have used the “AOE” as order to group the blues and the red in each corner and I will be using “number” as method so that I can specifically identify which are bigger than 0.8.

corrplot(cor(brazil_cities_summ[,5:39]), diag = FALSE, order = "AOE", 
         tl.pos = "td", tl.cex = 0.5, method = "number", type = "upper")

From the scatterplot matrix, Urban_Percentage and Working Percentage, Comp_C and Comp_E, GVA_Public and Comp_G, Comp_G and Comp_I, Comp_I and GVA_SERVICES, GVA_SERVICES and Pu_Agencies, Pu_Agencies and Comp_P, Comp_P and COMP_S, COMP_S and COMP_R, COMP_R and COMP_F, COMP_F and Pr_Agencies, Pr_Agencies and COMP_Q, COMP_Q and COMP_L, COMP_L and COMP_J, COMP_J and COMP_N, COMP_N and COMP_M, COMP_M and COMP_K is highly correlated with each other. In view of this, it is wiser to only include either one of them in the subsequent model building. As a result, Urban_Percentage, Comp_C, Comp_G, GVA_SERVICES, Comp_P, COMP_R, Pr_Agencies, COMP_L, COMP_N and COMP_M is excluded in the subsequent model building. Due to the result above not being visible. I will be replotting the corplot once again to view if there are any variables to be taken out after removing the 10 variables.

brazil_cities_summ <- brazil_cities_summ %>%
                      select(City_State, LONG, LAT, GDP_CAPITA, Brazilian_Percentage, Foreign_Percentage, Working_Percentage, Rural_Percentage, Production_Area, Tourism_Area, IDHM_Educacao, GVA_AGROPEC, GVA_INDUSTRY, GVA_PUBLIC, COMP_A, COMP_B, COMP_D, COMP_E, COMP_F, COMP_H, COMP_I, COMP_J, COMP_K, COMP_O, COMP_Q, COMP_S, COMP_U, HOTELS, Pu_Agencies)

I will now replot the corrplot once again.

corrplot(cor(brazil_cities_summ[,5:29]), diag = FALSE, order = "AOE", 
         tl.pos = "td", tl.cex = 0.5, method = "number", type = "upper")

From the scatterplot matrix, there are multiple variables that are highly correlated. In view of this, it is wiser to only include either one of them in the subsequent model building. As a result, Rural_Percentage, COMP_K, COMP_J, COMP_Q, COMP_F, COMP_S, Pu_Agencies, COMP_I, COMP_H and GVA_Public is excluded in the subsequent model building.

7.1.2 Building a hedonic pricing model using multiple linear regression method

The code chunk below using lm() to calibrate the multiple linear regression model. I will be using a confidence interval of 95%, hence the alpha value will be 0.05.

brazil.mlr <- lm(formula = GDP_CAPITA ~ Brazilian_Percentage + Foreign_Percentage + Working_Percentage + Production_Area + Tourism_Area + IDHM_Educacao + GVA_AGROPEC + COMP_A + COMP_B  + COMP_D + COMP_E + COMP_O + COMP_U + HOTELS + GVA_INDUSTRY, data=brazil_cities_summ.sf)

summary(brazil.mlr)
## 
## Call:
## lm(formula = GDP_CAPITA ~ Brazilian_Percentage + Foreign_Percentage + 
##     Working_Percentage + Production_Area + Tourism_Area + IDHM_Educacao + 
##     GVA_AGROPEC + COMP_A + COMP_B + COMP_D + COMP_E + COMP_O + 
##     COMP_U + HOTELS + GVA_INDUSTRY, data = brazil_cities_summ.sf)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -88669  -7038  -2113   3836 259983 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -2.328e+03  5.559e+03  -0.419   0.6754    
## Brazilian_Percentage -2.224e+04  5.692e+03  -3.908 9.41e-05 ***
## Foreign_Percentage   -2.938e+03  3.947e+04  -0.074   0.9407    
## Working_Percentage   -1.142e+04  1.871e+03  -6.101 1.12e-09 ***
## Production_Area       7.861e+00  3.453e+01   0.228   0.8199    
## Tourism_Area          7.085e+03  4.655e+02  15.221  < 2e-16 ***
## IDHM_Educacao         6.834e+04  3.007e+03  22.724  < 2e-16 ***
## GVA_AGROPEC           9.947e-02  5.196e-03  19.143  < 2e-16 ***
## COMP_A                6.087e+00  3.547e+00   1.716   0.0862 .  
## COMP_B                3.979e+01  6.471e+01   0.615   0.5387    
## COMP_D               -3.004e+02  1.275e+02  -2.356   0.0185 *  
## COMP_E               -8.603e+02  6.555e+01 -13.124  < 2e-16 ***
## COMP_O                5.990e+02  9.784e+01   6.123 9.84e-10 ***
## COMP_U               -7.573e+03  1.740e+03  -4.352 1.37e-05 ***
## HOTELS               -7.259e+01  1.538e+02  -0.472   0.6370    
## GVA_INDUSTRY          1.409e-02  4.759e-04  29.595  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15700 on 5557 degrees of freedom
## Multiple R-squared:  0.3929, Adjusted R-squared:  0.3913 
## F-statistic: 239.8 on 15 and 5557 DF,  p-value: < 2.2e-16

With reference to the report above, it is clear that not all the indepent variables are statistically significant. We will revised the model by removing those variables which are not statistically significant which are Foreign_Percentage, Production_Area, Comp_A, Comp_B and Hotels.

Based on the adjusted r-squared value, it means that this calculation is able to account for 39.13% of the GDP Per Capita.

Now, we are ready to calibrate the revised model by using the code chunk below.

brazil.mlr1 <- lm(formula = GDP_CAPITA ~ Brazilian_Percentage + Working_Percentage + Tourism_Area + IDHM_Educacao + GVA_AGROPEC + COMP_D + COMP_E + COMP_O + COMP_U + GVA_INDUSTRY, data=brazil_cities_summ.sf)

ols_regress(brazil.mlr1)
##                             Model Summary                             
## ---------------------------------------------------------------------
## R                       0.627       RMSE                   15695.730 
## R-Squared               0.393       Coef. Var                100.210 
## Adj. R-Squared          0.391       MSE                246355925.507 
## Pred R-Squared          0.376       MAE                     8479.728 
## ---------------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                        ANOVA                                        
## -----------------------------------------------------------------------------------
##                         Sum of                                                     
##                        Squares          DF        Mean Square       F         Sig. 
## -----------------------------------------------------------------------------------
## Regression    885468197318.578          10    88546819731.858    359.426    0.0000 
## Residual          1.370232e+12        5562      246355925.507                      
## Total               2.2557e+12        5572                                         
## -----------------------------------------------------------------------------------
## 
##                                              Parameter Estimates                                               
## --------------------------------------------------------------------------------------------------------------
##                model          Beta    Std. Error    Std. Beta       t        Sig          lower         upper 
## --------------------------------------------------------------------------------------------------------------
##          (Intercept)     -2404.316      5502.145                  -0.437    0.662    -13190.669      8382.038 
## Brazilian_Percentage    -22419.987      5617.164       -0.043     -3.991    0.000    -33431.822    -11408.153 
##   Working_Percentage    -11206.975      1865.409       -0.082     -6.008    0.000    -14863.906     -7550.044 
##         Tourism_Area      7118.977       461.724        0.176     15.418    0.000      6213.818      8024.136 
##        IDHM_Educacao     68752.352      2981.038        0.327     23.063    0.000     62908.352     74596.351 
##          GVA_AGROPEC         0.100         0.005        0.215     19.448    0.000         0.090         0.111 
##               COMP_D      -303.942       126.558       -0.047     -2.402    0.016      -552.045       -55.839 
##               COMP_E      -846.379        63.859       -0.255    -13.254    0.000      -971.567      -721.190 
##               COMP_O       599.528        94.557        0.124      6.340    0.000       414.160       784.896 
##               COMP_U     -7616.105      1723.806       -0.073     -4.418    0.000    -10995.438     -4236.772 
##         GVA_INDUSTRY         0.014         0.000        0.448     29.713    0.000         0.013         0.015 
## --------------------------------------------------------------------------------------------------------------

With the revised model, we only retain all the statistifically significant variables where it is smaller than 0.05. I can see that there are some changes towards the adjusted R-squared. It shows that by removing those non significant variables, the adjusted r-squared deproved slightly.

Based on the adjusted r-squared value, it means that this calculation is able to account for 39.1% of the GDP Per Capita.

7.1.3 Checking for multicolinearity

In this section, I will be using the package olsrr I will be using the following methods for building better multiple linear regression models:

  • comprehensive regression output
  • residual diagnostics
  • measures of influence
  • heteroskedasticity tests
  • collinearity diagnostics
  • model fit assessment
  • variable contribution assessment
  • variable selection procedures

In the code chunk below, the ols_vif_tol() of olsrr package is used to test if there are sign of multicollinearity.

ols_vif_tol(brazil.mlr1)
##               Variables Tolerance      VIF
## 1  Brazilian_Percentage 0.9589430 1.042815
## 2    Working_Percentage 0.5923563 1.688173
## 3          Tourism_Area 0.8422091 1.187354
## 4         IDHM_Educacao 0.5440004 1.838234
## 5           GVA_AGROPEC 0.8974213 1.114304
## 6                COMP_D 0.2830552 3.532879
## 7                COMP_E 0.2954672 3.384471
## 8                COMP_O 0.2839964 3.521171
## 9                COMP_U 0.3971465 2.517963
## 10         GVA_INDUSTRY 0.4801487 2.082688

Since the VIF of the independent variables are less than 10. I can safely conclude that there are no sign of multicollinearity among the independent variables.

7.1.4 Test for Non-Linearity

In multiple linear regression, it is important to test the assumption that linearity and additivity of the relationship between dependent and independent variables.

In the code chunk below, the ols_plot_resid_fit() of olsrr package is used to perform linearity assumption test.

ols_plot_resid_fit(brazil.mlr1)

The figure above reveals that most of the data poitns are scattered around the 0 line, hence we can safely conclude that the relationships between the dependent variable and independent variables are linear.

7.1.5 Test for Normality Assumption

Lastly, the code chunk below uses ols_plot_resid_hist() of olsrr package to perform normality assumption test.

ols_plot_resid_hist(brazil.mlr1)

The figure reveals that the residual of the multiple linear regression model (i.e. brazil.mlr1) resembles a normal distribution pattern.

ols_test_normality has a restriction where sample size must be between 3 and 5000. The current dataset is 5572, hence I will be taking a sample of the data to test the normality.

sample_brazil_cities_summ.sf <- brazil_cities_summ.sf[sample(nrow(brazil_cities_summ.sf), 5000), ]
brazil.mlr2 <- lm(formula = GDP_CAPITA ~ Brazilian_Percentage + Working_Percentage + Tourism_Area + IDHM_Educacao + GVA_AGROPEC + COMP_A  + COMP_D + COMP_E + COMP_O + COMP_U + GVA_INDUSTRY, data=sample_brazil_cities_summ.sf)

ols_regress(brazil.mlr2)
##                             Model Summary                             
## ---------------------------------------------------------------------
## R                       0.628       RMSE                   15927.625 
## R-Squared               0.394       Coef. Var                101.918 
## Adj. R-Squared          0.393       MSE                253689227.042 
## Pred R-Squared          0.374       MAE                     8517.410 
## ---------------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                        ANOVA                                        
## -----------------------------------------------------------------------------------
##                         Sum of                                                     
##                        Squares          DF        Mean Square       F         Sig. 
## -----------------------------------------------------------------------------------
## Regression    824200920636.918          11    74927356421.538    295.351    0.0000 
## Residual          1.265402e+12        4988      253689227.042                      
## Total             2.089603e+12        4999                                         
## -----------------------------------------------------------------------------------
## 
##                                              Parameter Estimates                                              
## -------------------------------------------------------------------------------------------------------------
##                model          Beta    Std. Error    Std. Beta       t        Sig          lower        upper 
## -------------------------------------------------------------------------------------------------------------
##          (Intercept)     -2869.700      5963.561                  -0.481    0.630    -14560.901     8821.502 
## Brazilian_Percentage    -21493.010      6079.729       -0.040     -3.535    0.000    -33411.953    -9574.067 
##   Working_Percentage    -11590.481      2001.926       -0.083     -5.790    0.000    -15515.135    -7665.826 
##         Tourism_Area      7135.556       496.391        0.173     14.375    0.000      6162.411     8108.700 
##        IDHM_Educacao     67969.100      3195.980        0.319     21.267    0.000     61703.574    74234.626 
##          GVA_AGROPEC         0.102         0.006        0.214     18.341    0.000         0.091        0.113 
##               COMP_A         8.158         3.770        0.025      2.164    0.031         0.767       15.549 
##               COMP_D      -339.835       136.322       -0.053     -2.493    0.013      -607.086      -72.584 
##               COMP_E      -914.713        68.236       -0.279    -13.405    0.000     -1048.485     -780.941 
##               COMP_O       667.008       104.792        0.134      6.365    0.000       461.570      872.446 
##               COMP_U     -6439.908      2020.122       -0.056     -3.188    0.001    -10400.236    -2479.581 
##         GVA_INDUSTRY         0.014         0.000        0.460     28.955    0.000         0.013        0.015 
## -------------------------------------------------------------------------------------------------------------
ols_test_normality(brazil.mlr2)
## Warning in ks.test(y, "pnorm", mean(y), sd(y)): ties should not be present for
## the Kolmogorov-Smirnov test
## -----------------------------------------------
##        Test             Statistic       pvalue  
## -----------------------------------------------
## Shapiro-Wilk              0.6229         0.0000 
## Kolmogorov-Smirnov        0.1786         0.0000 
## Cramer-von Mises         463.7117        0.0000 
## Anderson-Darling         345.7266        0.0000 
## -----------------------------------------------

The summary table above reveals that the p-values of the four tests are way smaller than the alpha value of 0.05. Hence we will reject the null hypothesis that the residual is NOT resemble normal distribution.

7.1.6 Testing for Spatial Autocorrelation

The hedonic model I am trying to build are using geographically referenced attributes, hence it is also important for us to visual the residual of the hedonic pricing model. In order to perform spatial autocorrelation test, there is a need to convert brazil_cities_summ.sf simple into a SpatialPointsDataFrame.

In this section, I will perform a test of absence of spatial autocorrelation for the residuals.

The test hypotheses are:

Ho = The distribution of residuals are randomly distributed.

H1= The distribution of residuals are not randomly distributed.

The 95% confidence interval will be used. Our alpha value will be 0.05.

First, we will export the residual of the hedonic pricing model and save it as a data frame. next, we will join the newly created data frame with brazil_cities_summ.sf object.

Below is the code chunk used to complete the tasks.

mlr.output <- as.data.frame(brazil.mlr1$residuals)

brazil_cities_summ.res.sf <- cbind(brazil_cities_summ.sf, 
                             brazil.mlr1$residuals) %>%
                             rename(`MLR_RES` = `brazil.mlr1.residuals`)

In order to plot a choropleth map, I will be combining both data sets together to present the values.

mun_brazil_cities_mlr.sf <- st_join(mun_sirgas2000, brazil_cities_summ.res.sf) 

Next, we will convert mun_brazil_cities_mlr.sf simple feature object into a SpatialPointsDataFrame because spdep package can only process sp conformed spatial data objects.

The code chunk below will be used to perform the data conversion process.

mun_brazil_cities_mlr.sp <- as_Spatial(mun_brazil_cities_mlr.sf)
mun_brazil_cities_mlr.sp
## class       : SpatialPolygonsDataFrame 
## features    : 5575 
## extent      : 1552246, 6575781, 6030702, 10583412  (xmin, xmax, ymin, ymax)
## crs         : +proj=merc +lon_0=-43 +lat_ts=-2 +x_0=5000000 +y_0=10000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
## variables   : 42
## names       : code_mn, name_mn, cod_stt, abbrv_s,           City_State, GDP_CAPITA, Brazilian_Percentage, Foreign_Percentage, Working_Percentage, Urban_Percentage,  Rural_Percentage,  Production_Area, Tourism_Area, IDHM_Educacao, GVA_AGROPEC, ... 
## min values  : 1100015, Ângulo,      11,      AC, Abadia De Goiás - GO,          0,                    0,                  0,                  0,                0,                 0,                0,            0,             0,           0, ... 
## max values  : 5300108, Zortéa,      53,      TO,          Zortéa - SC,  314637.69,                    1,  0.377218184890992,  0.709886841723488,                1, 0.954473822447908, 106.960227272727,            1,         0.825,   655505.29, ...
brazil_cities_summ.res.sp <- as_Spatial(brazil_cities_summ.res.sf)
brazil_cities_summ.res.sp
## class       : SpatialPointsDataFrame 
## features    : 5573 
## extent      : 1671725, 6175358, 6039171, 10507274  (xmin, xmax, ymin, ymax)
## crs         : +proj=merc +lon_0=-43 +lat_ts=-2 +x_0=5000000 +y_0=10000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
## variables   : 38
## names       :           City_State, GDP_CAPITA, Brazilian_Percentage, Foreign_Percentage, Working_Percentage, Urban_Percentage,  Rural_Percentage,  Production_Area, Tourism_Area, IDHM_Educacao, GVA_AGROPEC, GVA_INDUSTRY, GVA_SERVICES,  GVA_PUBLIC, COMP_A, ... 
## min values  : Abadia De Goiás - GO,          0,                    0,                  0,                  0,                0,                 0,                0,            0,             0,           0,            0,            0,           0,      0, ... 
## max values  :          Zortéa - SC,  314637.69,                    1,  0.377218184890992,  0.709886841723488,                1, 0.954473822447908, 106.960227272727,            1,         0.825,   655505.29,  15043914.83,   53213121.5, 10664796.91,   1948, ...

The code chunks below is used to create an interactive point symbol map.

tm_shape(mun_brazil_cities_mlr.sp) +  
  tm_polygons(col = "MLR_RES",
              alpha = 0.6,
              style="quantile",
              popup.vars = c("City_State","MLR_RES"))
## Variable(s) "MLR_RES" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

The figure above reveal that there is sign of spatial autocorrelation.

To proof that our observation is indeed true, the Moran’s I test will be performed

Next we will compute the distance-based weight matrix by using dnearneigh() function of spdep.I will find the minimum and maximum meters to be used in the dnearneigh() method.

coords <- coordinates(brazil_cities_summ.res.sp) 
k1 <- knn2nb(knearneigh(coords))
k1dists <- unlist(nbdists(k1, coords, longlat = FALSE))
summary(k1dists)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##    645.7   9640.9  13913.2  17755.3  20606.1 363945.4
nb <- dnearneigh(coordinates(brazil_cities_summ.res.sp), 0, 371000, longlat = FALSE)
summary(nb)
## Neighbour list object:
## Number of regions: 5573 
## Number of nonzero links: 2783624 
## Percentage nonzero weights: 8.962568 
## Average number of links: 499.4839 
## Link number distribution:
## 
##   3   7   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26 
##   1   1   2   3   4   7   5  14   7   7   7   5   1   3   4   3   3   6   4   1 
##  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46 
##   8   4   5   3   3   8   2   8   5  11   5   2   5   3   5   2   1   1   3   6 
##  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66 
##   7   7   5   2   8   8  15   7   3   7   4   4   7   4   2   3   3   3   4   2 
##  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86 
##   2   5   2   2   2   4   5   6   4   4   3   2   1   2   4   7   1   1   4   2 
##  88  89  93  94  96  97  99 100 101 102 103 104 105 106 107 108 109 110 111 112 
##   4   2   1   1   1   2   1   3   1   3   7   2   2   4   5   3   2   3   3   4 
## 113 114 115 116 117 118 121 122 123 124 125 126 127 128 129 130 131 132 133 134 
##   2   3   1   4   5   1   3   2   3   3   3   1   6   2   4   3   5   9   4   2 
## 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 
##   1   4   5   6   4   4   8   8  10   1   6   7   9   8   6   7   6   3   5   9 
## 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 175 
##   1   4   3   6   3   3   4   7   5   5   6   6   1   3   3   5   3   4   9   6 
## 176 177 178 180 181 182 183 184 185 186 187 188 189 190 191 193 194 195 196 197 
##   5   3   3   2   1   4   1   6   3   5   6   1   5   3   2   4   3   2   2   4 
## 198 199 200 201 202 203 204 205 207 208 209 210 211 212 213 214 215 216 218 219 
##   5   2   2   5   4   4   3   4   3   5   4   7   1   5   1   4   2   7   5   5 
## 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 
##   5   7   6   4   4   6   4   5   6   8   8   7   4   3   6   5   9   5   3   2 
## 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 
##   4   7   6  10   7   5   4   8   2   1   9   7   7   9   4   7   6   9   4   4 
## 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 
##   4   5   8   7  10   7   7  11   6  11   6   4   6  10   7   6   5   6   6   6 
## 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 
##   2   6   5   2   4   2   6  11   4   5   4   3   4   4   4   6   4   2   4   2 
## 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 
##   5   4   7   8   6   2   4   5   4   2   8   9   6   1   7  11   3   4   9   9 
## 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 
##   6  10   3   3  10   6   7   5   5   4   3   7   5   3   7   9  11   4   5   8 
## 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 
##   6   7   4   4   3   8  10   3   6   6  10   7   8   6   6   6   7   9   5  11 
## 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 
##  12   8   7   7   7   8   8   5   5  11   5  12  13  13   7   8   9   7  10   7 
## 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 
##   9   8   7   6   9   4  11   6   6   8  13  11  14   8   7   3   5   2   9   8 
## 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 
##   4  12   7   6   2   5   6   9   2   6   4   6   5   6   9  10   4  12   6   6 
## 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 
##   9   4   7   4  13   5   4   6   3   7   2   4   7   4   2   4   5   1   8   5 
## 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 
##   2  10   5   7   4   3   1   7   4   4   3   5   4   4  11   8   3   3   3   3 
## 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 
##   3   2   3   5   5   4   2   3   7   8   2   9   5   7   6   6  12   7   5   6 
## 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 
##   4   4   3   5   5   6   6   9   4   7   5   5   3   5   6   2  11   8   6   4 
## 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 
##   5   5   6   3   6   9   3   4   7   7   4   6   2   3   6   6   8   8   6   4 
## 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 
##   5   6   5   8  10  10   6   5  13   4   5   7   5   7   4   8  11   6   8   3 
## 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 
##   4   3   9   8   2   6   7   5   9  12   1   6   6  10   7   5   5   9   5  10 
## 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 
##   5   8  10   3   6  11  12  10   6   8  13  11   9   6   8  16   8  11  14  11 
## 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 
##  13  13   9  10  13   7  10  12   4   7   9   9  11  10   4   7  10  11   9   9 
## 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 
##  12  11   6  14   9   5   8   8   7  10   7  12   8   8  15   9   9  12   7  12 
## 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 
##   8  13  14  13   6   8  10  10  11   8  12   5  11  16   8  12  12  11   7  17 
## 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 
##  17   6   9  13   6  14  13   9   5  12  11  13  14  14  14  12  11   6   9  12 
## 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 
##  12   6   8   8   9   7  11   6   9  10  15  10  14   7   5   7   9  11  14   9 
## 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 
##   5  18   4   7  19   9  15  12   8  13  11   8  14   6  11  11   8  15  11   7 
## 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 
##  13  11   8   9  12  14  15   7  10  11  12   7  10  13   9   9   8  13  11   6 
## 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 
##   4   7   8   7   8   9   7  10  11  14   5  13  12   4   6   5  10  10   9  11 
## 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 
##  18  17   6   6  11  11   5   7   8  10  10   8  17   6   9  12   7  15   5   8 
## 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 
##   7  14   8   6   9   6   6   8   8   6   8  12   9   8   3  11  10   6  14   7 
## 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 
##  12  10   7   9   5   8   9  12   6   6   5  11  10   8  12   8   8   9  10   8 
## 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 
##  12   8   8   6   5   7   6   9   9   6   9   7   7  11   3   9   8   9   9   8 
## 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 
##   2   9   7   6   4   9   9  13  11   6  10   6   3   2   4   7   6   3   7   8 
## 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 
##   5   7   4   3   5   6   8   4   3   3   6   6   4   5   1   4   7   5   4   4 
## 860 861 862 863 864 865 866 867 868 870 
##   5   2   1   4   2   1   1   2   1   1 
## 1 least connected region:
## 1774 with 3 links
## 1 most connected region:
## 4046 with 870 links
nb_lw <- nb2listw(nb, style = 'W')
summary(nb_lw)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 5573 
## Number of nonzero links: 2783624 
## Percentage nonzero weights: 8.962568 
## Average number of links: 499.4839 
## Link number distribution:
## 
##   3   7   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26 
##   1   1   2   3   4   7   5  14   7   7   7   5   1   3   4   3   3   6   4   1 
##  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46 
##   8   4   5   3   3   8   2   8   5  11   5   2   5   3   5   2   1   1   3   6 
##  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66 
##   7   7   5   2   8   8  15   7   3   7   4   4   7   4   2   3   3   3   4   2 
##  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86 
##   2   5   2   2   2   4   5   6   4   4   3   2   1   2   4   7   1   1   4   2 
##  88  89  93  94  96  97  99 100 101 102 103 104 105 106 107 108 109 110 111 112 
##   4   2   1   1   1   2   1   3   1   3   7   2   2   4   5   3   2   3   3   4 
## 113 114 115 116 117 118 121 122 123 124 125 126 127 128 129 130 131 132 133 134 
##   2   3   1   4   5   1   3   2   3   3   3   1   6   2   4   3   5   9   4   2 
## 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 
##   1   4   5   6   4   4   8   8  10   1   6   7   9   8   6   7   6   3   5   9 
## 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 175 
##   1   4   3   6   3   3   4   7   5   5   6   6   1   3   3   5   3   4   9   6 
## 176 177 178 180 181 182 183 184 185 186 187 188 189 190 191 193 194 195 196 197 
##   5   3   3   2   1   4   1   6   3   5   6   1   5   3   2   4   3   2   2   4 
## 198 199 200 201 202 203 204 205 207 208 209 210 211 212 213 214 215 216 218 219 
##   5   2   2   5   4   4   3   4   3   5   4   7   1   5   1   4   2   7   5   5 
## 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 
##   5   7   6   4   4   6   4   5   6   8   8   7   4   3   6   5   9   5   3   2 
## 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 
##   4   7   6  10   7   5   4   8   2   1   9   7   7   9   4   7   6   9   4   4 
## 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 
##   4   5   8   7  10   7   7  11   6  11   6   4   6  10   7   6   5   6   6   6 
## 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 
##   2   6   5   2   4   2   6  11   4   5   4   3   4   4   4   6   4   2   4   2 
## 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 
##   5   4   7   8   6   2   4   5   4   2   8   9   6   1   7  11   3   4   9   9 
## 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 
##   6  10   3   3  10   6   7   5   5   4   3   7   5   3   7   9  11   4   5   8 
## 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 
##   6   7   4   4   3   8  10   3   6   6  10   7   8   6   6   6   7   9   5  11 
## 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 
##  12   8   7   7   7   8   8   5   5  11   5  12  13  13   7   8   9   7  10   7 
## 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 
##   9   8   7   6   9   4  11   6   6   8  13  11  14   8   7   3   5   2   9   8 
## 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 
##   4  12   7   6   2   5   6   9   2   6   4   6   5   6   9  10   4  12   6   6 
## 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 
##   9   4   7   4  13   5   4   6   3   7   2   4   7   4   2   4   5   1   8   5 
## 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 
##   2  10   5   7   4   3   1   7   4   4   3   5   4   4  11   8   3   3   3   3 
## 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 
##   3   2   3   5   5   4   2   3   7   8   2   9   5   7   6   6  12   7   5   6 
## 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 
##   4   4   3   5   5   6   6   9   4   7   5   5   3   5   6   2  11   8   6   4 
## 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 
##   5   5   6   3   6   9   3   4   7   7   4   6   2   3   6   6   8   8   6   4 
## 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 
##   5   6   5   8  10  10   6   5  13   4   5   7   5   7   4   8  11   6   8   3 
## 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 
##   4   3   9   8   2   6   7   5   9  12   1   6   6  10   7   5   5   9   5  10 
## 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 
##   5   8  10   3   6  11  12  10   6   8  13  11   9   6   8  16   8  11  14  11 
## 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 
##  13  13   9  10  13   7  10  12   4   7   9   9  11  10   4   7  10  11   9   9 
## 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 
##  12  11   6  14   9   5   8   8   7  10   7  12   8   8  15   9   9  12   7  12 
## 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 
##   8  13  14  13   6   8  10  10  11   8  12   5  11  16   8  12  12  11   7  17 
## 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 
##  17   6   9  13   6  14  13   9   5  12  11  13  14  14  14  12  11   6   9  12 
## 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 
##  12   6   8   8   9   7  11   6   9  10  15  10  14   7   5   7   9  11  14   9 
## 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 
##   5  18   4   7  19   9  15  12   8  13  11   8  14   6  11  11   8  15  11   7 
## 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 
##  13  11   8   9  12  14  15   7  10  11  12   7  10  13   9   9   8  13  11   6 
## 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 
##   4   7   8   7   8   9   7  10  11  14   5  13  12   4   6   5  10  10   9  11 
## 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 
##  18  17   6   6  11  11   5   7   8  10  10   8  17   6   9  12   7  15   5   8 
## 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 
##   7  14   8   6   9   6   6   8   8   6   8  12   9   8   3  11  10   6  14   7 
## 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 
##  12  10   7   9   5   8   9  12   6   6   5  11  10   8  12   8   8   9  10   8 
## 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 
##  12   8   8   6   5   7   6   9   9   6   9   7   7  11   3   9   8   9   9   8 
## 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 
##   2   9   7   6   4   9   9  13  11   6  10   6   3   2   4   7   6   3   7   8 
## 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 
##   5   7   4   3   5   6   8   4   3   3   6   6   4   5   1   4   7   5   4   4 
## 860 861 862 863 864 865 866 867 868 870 
##   5   2   1   4   2   1   1   2   1   1 
## 1 least connected region:
## 1774 with 3 links
## 1 most connected region:
## 4046 with 870 links
## 
## Weights style: W 
## Weights constants summary:
##      n       nn   S0       S1       S2
## W 5573 31058329 5573 46.16184 22428.61
lm.morantest(brazil.mlr1, nb_lw)
## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = GDP_CAPITA ~ Brazilian_Percentage +
## Working_Percentage + Tourism_Area + IDHM_Educacao + GVA_AGROPEC +
## COMP_D + COMP_E + COMP_O + COMP_U + GVA_INDUSTRY, data =
## brazil_cities_summ.sf)
## weights: nb_lw
## 
## Moran I statistic standard deviate = 41.846, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##     4.796024e-02    -3.500424e-04     1.332827e-06

The results shows that the Moran I p-value is less than 0.00000000000000022 which is smaller than the alpha value of 0.05. Hence, we will reject the null hypothesis that the residuals are randomly distributed. This will allow us to infer that the distribution is a cuter distribution. Insce the observed Global Moran I = 0.04796024 which is positive spatial autocorrelation which means the residuals shows signs of cluster, even though it is not very strong

8 Building Hedonic Pricing Models using GWmodel

In this section, I will be modelling hedonic pricing using both the fixed and adaptive bandwidth schemes and comparing which to use for GWR visualisation.

8.1 Building Fixed Bandwidth GWR Model

8.1.1 Computing fixed bandwidth

In the code chunk below bw.gwr() of GWModel package is used to determine the optimal fixed bandwidth to use in the model. I will be using the CV Cross-Validation Approach as the stopping rule. I will be setting adaptive as false since I will be currently using a fixed bandwidth method where the bandwidth is of a fixed distance. I will be using the gaussian method as the kernel where the calculation is gaussian **wgt = exp(-.5*(vdist/bw)^2)** and longlat would be false as the same as what I have used above for our dnearneigh.

bw.fixed <- bw.gwr(formula = GDP_CAPITA ~ Brazilian_Percentage + Working_Percentage + Tourism_Area + IDHM_Educacao + GVA_AGROPEC + COMP_D + COMP_E + COMP_O + COMP_U + GVA_INDUSTRY, data=brazil_cities_summ.res.sp, approach="CV", kernel="gaussian", adaptive=FALSE, longlat=FALSE)
## Take a cup of tea and have a break, it will take a few minutes.
##           -----A kind suggestion from GWmodel development group
## Fixed bandwidth: 3921308 CV score: 1.395285e+12 
## Fixed bandwidth: 2423986 CV score: 1.378261e+12 
## Fixed bandwidth: 1498590 CV score: 1.348028e+12 
## Fixed bandwidth: 926664.5 CV score: 1.338012e+12 
## Fixed bandwidth: 573194.8 CV score: 1.336135e+12 
## Fixed bandwidth: 354738.5 CV score: 1.504225e+12 
## Fixed bandwidth: 708208.2 CV score: 1.338323e+12 
## Fixed bandwidth: 489751.9 CV score: 1.337712e+12 
## Fixed bandwidth: 624765.3 CV score: 1.336911e+12 
## Fixed bandwidth: 541322.4 CV score: 1.335897e+12 
## Fixed bandwidth: 521624.2 CV score: 1.336033e+12 
## Fixed bandwidth: 553496.6 CV score: 1.335945e+12 
## Fixed bandwidth: 533798.4 CV score: 1.335909e+12 
## Fixed bandwidth: 545972.5 CV score: 1.335907e+12 
## Fixed bandwidth: 538448.5 CV score: 1.335897e+12 
## Fixed bandwidth: 543098.6 CV score: 1.335899e+12 
## Fixed bandwidth: 540224.7 CV score: 1.335896e+12 
## Fixed bandwidth: 539546.3 CV score: 1.335897e+12 
## Fixed bandwidth: 540644 CV score: 1.335897e+12 
## Fixed bandwidth: 539965.6 CV score: 1.335896e+12 
## Fixed bandwidth: 539805.4 CV score: 1.335896e+12 
## Fixed bandwidth: 540064.5 CV score: 1.335896e+12 
## Fixed bandwidth: 539904.4 CV score: 1.335896e+12 
## Fixed bandwidth: 540003.4 CV score: 1.335896e+12 
## Fixed bandwidth: 539942.2 CV score: 1.335896e+12 
## Fixed bandwidth: 539980 CV score: 1.335896e+12 
## Fixed bandwidth: 539956.6 CV score: 1.335896e+12 
## Fixed bandwidth: 539951.1 CV score: 1.335896e+12 
## Fixed bandwidth: 539960 CV score: 1.335896e+12 
## Fixed bandwidth: 539954.5 CV score: 1.335896e+12 
## Fixed bandwidth: 539953.2 CV score: 1.335896e+12 
## Fixed bandwidth: 539955.3 CV score: 1.335896e+12 
## Fixed bandwidth: 539955.8 CV score: 1.335896e+12

The result shows that the recommended bandwidth is 539955.3 metres. THis is because I am using SIRGAS 2000 which is projected in metres.

8.1.2 GWModel method - fixed bandwith

Now we can use the code chunk below to calibrate the gwr model using fixed bandwidth and gaussian kernel. The code chunk belows takes all the points of the municipality and do regression with the bandwidth of 539955.3 metres.

gwr.fixed <- gwr.basic(formula = GDP_CAPITA ~ Brazilian_Percentage + Working_Percentage + Tourism_Area + IDHM_Educacao + GVA_AGROPEC + COMP_D + COMP_E + COMP_O + COMP_U + GVA_INDUSTRY, data=brazil_cities_summ.res.sp, bw=bw.fixed, kernel = 'gaussian', longlat = FALSE)

The output is saved in a list of class “gwrm”. The code below can be used to display the model output.

gwr.fixed
##    ***********************************************************************
##    *                       Package   GWmodel                             *
##    ***********************************************************************
##    Program starts at: 2020-05-31 21:45:36 
##    Call:
##    gwr.basic(formula = GDP_CAPITA ~ Brazilian_Percentage + Working_Percentage + 
##     Tourism_Area + IDHM_Educacao + GVA_AGROPEC + COMP_D + COMP_E + 
##     COMP_O + COMP_U + GVA_INDUSTRY, data = brazil_cities_summ.res.sp, 
##     bw = bw.fixed, kernel = "gaussian", longlat = FALSE)
## 
##    Dependent (y) variable:  GDP_CAPITA
##    Independent variables:  Brazilian_Percentage Working_Percentage Tourism_Area IDHM_Educacao GVA_AGROPEC COMP_D COMP_E COMP_O COMP_U GVA_INDUSTRY
##    Number of data points: 5573
##    ***********************************************************************
##    *                    Results of Global Regression                     *
##    ***********************************************************************
## 
##    Call:
##     lm(formula = formula, data = data)
## 
##    Residuals:
##    Min     1Q Median     3Q    Max 
## -90406  -7025  -2105   3835 260725 
## 
##    Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
##    (Intercept)          -2.404e+03  5.502e+03  -0.437   0.6621    
##    Brazilian_Percentage -2.242e+04  5.617e+03  -3.991 6.65e-05 ***
##    Working_Percentage   -1.121e+04  1.865e+03  -6.008 2.00e-09 ***
##    Tourism_Area          7.119e+03  4.617e+02  15.418  < 2e-16 ***
##    IDHM_Educacao         6.875e+04  2.981e+03  23.063  < 2e-16 ***
##    GVA_AGROPEC           1.004e-01  5.164e-03  19.448  < 2e-16 ***
##    COMP_D               -3.039e+02  1.266e+02  -2.402   0.0164 *  
##    COMP_E               -8.464e+02  6.386e+01 -13.254  < 2e-16 ***
##    COMP_O                5.995e+02  9.456e+01   6.340 2.47e-10 ***
##    COMP_U               -7.616e+03  1.724e+03  -4.418 1.01e-05 ***
##    GVA_INDUSTRY          1.406e-02  4.732e-04  29.713  < 2e-16 ***
## 
##    ---Significance stars
##    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
##    Residual standard error: 15700 on 5562 degrees of freedom
##    Multiple R-squared: 0.3925
##    Adjusted R-squared: 0.3915 
##    F-statistic: 359.4 on 10 and 5562 DF,  p-value: < 2.2e-16 
##    ***Extra Diagnostic information
##    Residual sum of squares: 1.370232e+12
##    Sigma(hat): 15683.05
##    AIC:  123511.6
##    AICc:  123511.6
##    ***********************************************************************
##    *          Results of Geographically Weighted Regression              *
##    ***********************************************************************
## 
##    *********************Model calibration information*********************
##    Kernel function: gaussian 
##    Fixed bandwidth: 539955.3 
##    Regression points: the same locations as observations are used.
##    Distance metric: Euclidean distance metric is used.
## 
##    ****************Summary of GWR coefficient estimates:******************
##                                Min.     1st Qu.      Median     3rd Qu.
##    Intercept            -1.8556e+04 -3.4687e+03 -1.1980e+03  3.3629e+03
##    Brazilian_Percentage -2.7412e+04 -2.4407e+04 -1.3465e+04  2.2394e+03
##    Working_Percentage   -2.8761e+04 -1.4797e+04 -9.9711e+02  3.0143e+03
##    Tourism_Area         -1.9085e+03  3.8788e+03  4.8524e+03  5.3596e+03
##    IDHM_Educacao        -9.5579e+02  1.1444e+04  5.2686e+04  6.9595e+04
##    GVA_AGROPEC           4.5468e-02  6.7269e-02  8.2598e-02  9.7079e-02
##    COMP_D               -4.4549e+03 -1.5244e+03 -9.4496e+02 -7.2777e+02
##    COMP_E               -1.4207e+04 -1.0215e+03 -8.8549e+02 -6.9897e+02
##    COMP_O               -5.9314e+02  1.7239e+02  1.4658e+03  1.8282e+03
##    COMP_U               -9.3366e+05 -1.6689e+04 -8.4031e+03 -1.4520e+02
##    GVA_INDUSTRY          1.2729e-02  1.3531e-02  1.5149e-02  2.2198e-02
##                               Max.
##    Intercept             5535.7426
##    Brazilian_Percentage  4183.2976
##    Working_Percentage    6797.9479
##    Tourism_Area          9485.6013
##    IDHM_Educacao        83829.7524
##    GVA_AGROPEC              0.3683
##    COMP_D               49994.1585
##    COMP_E                -369.0832
##    COMP_O                5630.5519
##    COMP_U                5858.0682
##    GVA_INDUSTRY             0.2505
##    ************************Diagnostic information*************************
##    Number of data points: 5573 
##    Effective number of parameters (2trace(S) - trace(S'S)): 66.16194 
##    Effective degrees of freedom (n-2trace(S) + trace(S'S)): 5506.838 
##    AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 122818.1 
##    AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 122765.2 
##    Residual sum of squares: 1.192907e+12 
##    R-square value:  0.4711589 
##    Adjusted R-square value:  0.464804 
## 
##    ***********************************************************************
##    Program stops at: 2020-05-31 21:45:55

From the results above, I am able to see that the Geograpic weighted regression is better than the global multiple regression. The reason is because the adjusted r-square has increased significantly where it was originally 0.3915 which has increased to 0.464804. Besides that the AIC is much smaller, where it was originally 123511.6 now it has decreased to 122765.2. Hence, we are able to see that the Geograpic weighted regression gives a better model than the global multiple regression model.

8.2 Building Adaptive Bandwidth GWR Model

Since we will be comparing between adaptive and fixed bandwidth model. I will now be calibrating the gwr-absed hedonic pricing model by using adaptive bandwidth approach.

8.2.1 Computing the adaptive bandwidth

The code chunk is similar to the one used to compute the fixed bandwidth except the adaptive argument has changed to TRUE. This is because TRUE means that I am calculating an adaptive kernel where the bandwidth (bw) corresponds to the number of nearest neighbours which is the adaptive bandwidth method. Besides that, I will be inputting a dMat which I will be using coordinates to retrieve a set of spatial coordinates.

DM<-gw.dist(dp.locat=coordinates(brazil_cities_summ.res.sp))
bw.adaptive <- bw.gwr(formula = GDP_CAPITA ~ Brazilian_Percentage + Working_Percentage + Tourism_Area + IDHM_Educacao + GVA_AGROPEC + COMP_D + COMP_E + COMP_O + COMP_U + GVA_INDUSTRY, data=brazil_cities_summ.res.sp, approach="CV", kernel="gaussian", adaptive=TRUE, longlat=FALSE, dMat= DM)
## Take a cup of tea and have a break, it will take a few minutes.
##           -----A kind suggestion from GWmodel development group
## Adaptive bandwidth: 3451 CV score: 1.363368e+12 
## Adaptive bandwidth: 2141 CV score: 1.331684e+12 
## Adaptive bandwidth: 1329 CV score: 1.335087e+12 
## Adaptive bandwidth: 2640 CV score: 1.344421e+12 
## Adaptive bandwidth: 1829 CV score: 1.324066e+12 
## Adaptive bandwidth: 1640 CV score: 1.323113e+12 
## Adaptive bandwidth: 1519 CV score: 1.326376e+12 
## Adaptive bandwidth: 1710 CV score: 1.323179e+12 
## Adaptive bandwidth: 1591 CV score: 1.323727e+12 
## Adaptive bandwidth: 1664 CV score: 1.322965e+12 
## Adaptive bandwidth: 1685 CV score: 1.32311e+12 
## Adaptive bandwidth: 1657 CV score: 1.322963e+12 
## Adaptive bandwidth: 1646 CV score: 1.322912e+12 
## Adaptive bandwidth: 1646 CV score: 1.322912e+12

The result shows that the 1646 is the recommended data points to be used.

8.2.2 Constructing the adaptive bandwidth gwr model

Now, I will calibrate the gwr-based hedonic pricing model by using adaptive bandwidth and gaussian kernel as shown in the code chunk below.

gwr.adaptive <- gwr.basic(formula = GDP_CAPITA ~ Brazilian_Percentage + Working_Percentage + Tourism_Area + IDHM_Educacao + GVA_AGROPEC + COMP_D + COMP_E + COMP_O + COMP_U + GVA_INDUSTRY, data=brazil_cities_summ.res.sp, bw=bw.adaptive, kernel = 'gaussian', adaptive=TRUE, longlat = FALSE, dMat= DM)

The code below can be used to display the model output.

gwr.adaptive
##    ***********************************************************************
##    *                       Package   GWmodel                             *
##    ***********************************************************************
##    Program starts at: 2020-05-31 21:48:13 
##    Call:
##    gwr.basic(formula = GDP_CAPITA ~ Brazilian_Percentage + Working_Percentage + 
##     Tourism_Area + IDHM_Educacao + GVA_AGROPEC + COMP_D + COMP_E + 
##     COMP_O + COMP_U + GVA_INDUSTRY, data = brazil_cities_summ.res.sp, 
##     bw = bw.adaptive, kernel = "gaussian", adaptive = TRUE, longlat = FALSE, 
##     dMat = DM)
## 
##    Dependent (y) variable:  GDP_CAPITA
##    Independent variables:  Brazilian_Percentage Working_Percentage Tourism_Area IDHM_Educacao GVA_AGROPEC COMP_D COMP_E COMP_O COMP_U GVA_INDUSTRY
##    Number of data points: 5573
##    ***********************************************************************
##    *                    Results of Global Regression                     *
##    ***********************************************************************
## 
##    Call:
##     lm(formula = formula, data = data)
## 
##    Residuals:
##    Min     1Q Median     3Q    Max 
## -90406  -7025  -2105   3835 260725 
## 
##    Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
##    (Intercept)          -2.404e+03  5.502e+03  -0.437   0.6621    
##    Brazilian_Percentage -2.242e+04  5.617e+03  -3.991 6.65e-05 ***
##    Working_Percentage   -1.121e+04  1.865e+03  -6.008 2.00e-09 ***
##    Tourism_Area          7.119e+03  4.617e+02  15.418  < 2e-16 ***
##    IDHM_Educacao         6.875e+04  2.981e+03  23.063  < 2e-16 ***
##    GVA_AGROPEC           1.004e-01  5.164e-03  19.448  < 2e-16 ***
##    COMP_D               -3.039e+02  1.266e+02  -2.402   0.0164 *  
##    COMP_E               -8.464e+02  6.386e+01 -13.254  < 2e-16 ***
##    COMP_O                5.995e+02  9.456e+01   6.340 2.47e-10 ***
##    COMP_U               -7.616e+03  1.724e+03  -4.418 1.01e-05 ***
##    GVA_INDUSTRY          1.406e-02  4.732e-04  29.713  < 2e-16 ***
## 
##    ---Significance stars
##    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
##    Residual standard error: 15700 on 5562 degrees of freedom
##    Multiple R-squared: 0.3925
##    Adjusted R-squared: 0.3915 
##    F-statistic: 359.4 on 10 and 5562 DF,  p-value: < 2.2e-16 
##    ***Extra Diagnostic information
##    Residual sum of squares: 1.370232e+12
##    Sigma(hat): 15683.05
##    AIC:  123511.6
##    AICc:  123511.6
##    ***********************************************************************
##    *          Results of Geographically Weighted Regression              *
##    ***********************************************************************
## 
##    *********************Model calibration information*********************
##    Kernel function: gaussian 
##    Adaptive bandwidth: 1646 (number of nearest neighbours)
##    Regression points: the same locations as observations are used.
##    Distance metric: A distance matrix is specified for this model calibration.
## 
##    ****************Summary of GWR coefficient estimates:******************
##                                Min.     1st Qu.      Median     3rd Qu.
##    Intercept            -1.1095e+04 -6.3114e+03 -4.0954e+03  2.4134e+03
##    Brazilian_Percentage -2.7348e+04 -2.5665e+04 -1.9347e+04 -6.4303e+03
##    Working_Percentage   -2.2478e+04 -1.5894e+04 -6.9321e+03  2.7160e+03
##    Tourism_Area          4.3196e+03  5.1647e+03  5.6113e+03  6.0234e+03
##    IDHM_Educacao         1.8012e+04  3.7979e+04  6.3580e+04  7.2073e+04
##    GVA_AGROPEC           6.3749e-02  8.0205e-02  8.6906e-02  9.7140e-02
##    COMP_D               -1.5274e+03 -9.4935e+02 -7.1588e+02 -4.3114e+02
##    COMP_E               -1.0454e+03 -9.4000e+02 -8.9357e+02 -8.6326e+02
##    COMP_O               -1.3658e+01  3.1497e+02  1.1099e+03  1.6933e+03
##    COMP_U               -1.9692e+04 -1.1752e+04 -8.7975e+03 -6.9979e+03
##    GVA_INDUSTRY          1.3008e-02  1.3433e-02  1.4026e-02  1.6852e-02
##                               Max.
##    Intercept             4808.6564
##    Brazilian_Percentage   -40.6551
##    Working_Percentage    4784.9676
##    Tourism_Area          7608.9509
##    IDHM_Educacao        78617.7536
##    GVA_AGROPEC              0.1100
##    COMP_D                -140.3515
##    COMP_E                -659.5818
##    COMP_O                1935.3819
##    COMP_U               -1628.7573
##    GVA_INDUSTRY             0.0209
##    ************************Diagnostic information*************************
##    Number of data points: 5573 
##    Effective number of parameters (2trace(S) - trace(S'S)): 30.53195 
##    Effective degrees of freedom (n-2trace(S) + trace(S'S)): 5542.468 
##    AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 123049.7 
##    AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 123023.5 
##    Residual sum of squares: 1.255332e+12 
##    R-square value:  0.4434843 
##    Adjusted R-square value:  0.440418 
## 
##    ***********************************************************************
##    Program stops at: 2020-05-31 21:48:35

From the results above, I am able to see that the Geograpic weighted regression is better than the global multiple regression. The reason is because the adjusted r-square has increased significantly where it was originally 0.3915 which has increased to 0.440418. Besides that the AIC is much smaller, where it was originally 123511.6 now it has decreased to 123023.5 Hence, we are able to see that the Geograpic weighted regression gives a better model than the global multiple regression model.

Based on what we have for both adaptive and fixed bandwidth method, it can be concluded that I am going to be using the fixed bandwidth method due to the adjusted r-square being higher.

9 Visualising GWR Output

In this exercise, I will be using he Local R2 numbers as well as the variable coefficients as shown below.

  • Local R2: these values range between 0.0 and 1.0 and indicate how well the local regression model fits observed y values. Very low values indicate the local model is performing poorly. Mapping the Local R2 values to see where GWR predicts well and where it predicts poorly may provide clues about important variables that may be missing from the regression model.

  • Coefficient Standard Error: these values measure the reliability of each coefficient estimate. Confidence in those estimates are higher when standard errors are small in relation to the actual coefficient values. Large standard errors may indicate problems with local collinearity.

They are all stored in a SpatialPointsDataFrame or SpatialPolygonsDataFrame object integrated with fit.points, GWR coefficient estimates, y value, predicted values, coefficient standard errors and t-values in its “data” slot in an object called SDF of the output list.

9.1 Converting SDF into sf data.frame

To visualise the fields in SDF, we need to first covert it into sf data.frame by using the code chunk below.

brazil_cities.sf.fixed <- st_as_sf(gwr.fixed$SDF) %>%
  st_transform(crs=4676)
brazil_cities.sf.fixed.sirgas2000 <- st_transform(brazil_cities.sf.fixed, 5641)
brazil_cities.sf.fixed.sirgas2000  
## Simple feature collection with 5573 features and 39 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 1671725 ymin: 6039171 xmax: 6175358 ymax: 10507270
## CRS:            EPSG:5641
## First 10 features:
##      Intercept Brazilian_Percentage Working_Percentage Tourism_Area
## 1   -3144.9979           -18544.300        -5545.65498     5774.894
## 2   -3214.9323           -18870.714        -4034.40181     5171.673
## 3   -4898.3327           -16017.149        -3748.36141     5759.233
## 4   -5109.5959           -16568.231        -1461.54560     4933.319
## 5    -378.7536            -2800.052          -56.62049     3410.307
## 6   -2262.6052             3033.102         5211.29736     3502.623
## 7  -13085.9487             2539.424         3244.38306     4936.943
## 8   -3010.4234             3235.564         4925.69194     3573.499
## 9    3612.8936           -24808.871       -16892.61030     5077.629
## 10   5020.6147           -24485.546       -21341.76662     5373.567
##    IDHM_Educacao GVA_AGROPEC     COMP_D     COMP_E    COMP_O     COMP_U
## 1      56678.151  0.11435641  -888.6802 -1200.3418 2363.1734 -19473.776
## 2      58668.416  0.09611855  -777.2283 -1118.0254 2028.6632 -16932.117
## 3      53634.852  0.11682160  -736.8842 -1184.1499 2250.8948 -23534.834
## 4      56942.664  0.08784055  -563.1791 -1061.4201 1702.3320 -18204.853
## 5      12526.243  0.11257821 -3116.2114  -927.2417 1066.3260  -5448.255
## 6       6230.986  0.05016339 -2016.9248  -607.8074 -410.0446   4466.898
## 7      29460.271  0.07457199  -748.1338  -889.1235  319.1103 -17946.766
## 8       7408.581  0.05023498 -2161.7728  -576.0529 -470.9893   4827.403
## 9      71070.032  0.08363214 -1004.7836  -958.0344 1923.1378  -8543.725
## 10     73526.533  0.08989688  -923.9524  -813.1687 1582.1872  -6852.952
##    GVA_INDUSTRY        y      yhat   residual CV_Score Stud_residual
## 1    0.01535003 20664.57 18190.192  2474.3782        0    0.16868894
## 2    0.01454948 25591.70 20763.580  4828.1199        0    0.32853246
## 3    0.01543732     0.00  8649.083 -8649.0827        0   -0.58826741
## 4    0.01423313     0.00  9210.683 -9210.6828        0   -0.62663901
## 5    0.01694804     0.00  3527.503 -3527.5026        0   -0.24048510
## 6    0.02540875  6370.41  6112.793   257.6173        0    0.01756732
## 7    0.02095800  6982.70 10392.853 -3410.1527        0   -0.23235544
## 8    0.02632233     0.00  5187.810 -5187.8102        0   -0.35272741
## 9    0.01347565 21173.60 19472.979  1700.6211        0    0.11569231
## 10   0.01305545 24739.02 30303.309 -5564.2894        0   -0.37874989
##    Intercept_SE Brazilian_Percentage_SE Working_Percentage_SE Tourism_Area_SE
## 1     11646.291               11751.671              2925.076        644.4293
## 2      9371.692                9502.062              2714.078        614.0907
## 3     11046.372               11117.922              2962.319        640.2940
## 4      7792.127                7929.122              2788.390        622.5630
## 5     14000.490               14423.756              5692.811       1554.2536
## 6     10704.016               10710.061              3410.244        773.4134
## 7     11303.343               11301.680              3069.502        644.0903
## 8     10508.294               10505.228              3311.623        751.8942
## 9      6819.852                7279.425              2394.739        630.1943
## 10     7056.223                7708.729              2654.521        738.8503
##    IDHM_Educacao_SE GVA_AGROPEC_SE COMP_D_SE COMP_E_SE COMP_O_SE COMP_U_SE
## 1          4799.106    0.007221642  212.9040  79.08249  181.0953  3137.984
## 2          4534.291    0.006814153  192.3377  71.67762  157.9748  2857.792
## 3          4709.190    0.007344399  221.8439  80.16625  175.1305  3260.319
## 4          4561.050    0.006884230  195.1429  71.34929  151.3758  2876.555
## 5          8941.114    0.031811658 1011.8494 516.56482  420.6442  6552.888
## 6          6075.690    0.009148915  467.7670 212.15986  195.1425  3697.514
## 7          4927.449    0.007648189  238.6086 101.54968  123.0517  3012.292
## 8          5871.028    0.009035869  428.2346 191.42099  178.6005  3552.911
## 9          4861.638    0.006179895  153.8894  70.65013  165.4053  2142.220
## 10         5860.345    0.006742507  167.7534  79.60971  198.2892  2097.650
##    GVA_INDUSTRY_SE Intercept_TV Brazilian_Percentage_TV Working_Percentage_TV
## 1     0.0005487270  -0.27004287              -1.5780139          -1.895901265
## 2     0.0005173693  -0.34304715              -1.9859599          -1.486472203
## 3     0.0005547186  -0.44343361              -1.4406603          -1.265347136
## 4     0.0005159234  -0.65573829              -2.0895417          -0.524153888
## 5     0.0034655423  -0.02705288              -0.1941278          -0.009945963
## 6     0.0013291054  -0.21137910               0.2832012           1.528130383
## 7     0.0008045272  -1.15770609               0.2246944           1.056973608
## 8     0.0012830511  -0.28648070               0.3079956           1.487395129
## 9     0.0005035637   0.52976127              -3.4080811          -7.054050414
## 10    0.0005482723   0.71151585              -3.1763403          -8.039779680
##    Tourism_Area_TV IDHM_Educacao_TV GVA_AGROPEC_TV COMP_D_TV  COMP_E_TV
## 1         8.961253        11.810149      15.835236 -4.174089 -15.178351
## 2         8.421677        12.938830      14.105721 -4.040957 -15.597970
## 3         8.994671        11.389401      15.906216 -3.321634 -14.771177
## 4         7.924210        12.484551      12.759677 -2.885982 -14.876394
## 5         2.194177         1.400971       3.538898 -3.079719  -1.795015
## 6         4.528785         1.025560       5.482988 -4.311816  -2.864856
## 7         7.664985         5.978808       9.750281 -3.135402  -8.755552
## 8         4.752662         1.261888       5.559507 -5.048104  -3.009351
## 9         8.057244        14.618535      13.532938 -6.529257 -13.560264
## 10        7.272877        12.546451      13.332858 -5.507801 -10.214442
##    COMP_O_TV COMP_U_TV GVA_INDUSTRY_TV  Local_R2                geometry
## 1  13.049333 -6.205825       27.973893 0.3599996 POINT (4283475 8120684)
## 2  12.841693 -5.924896       28.122037 0.3580482 POINT (4510843 7920105)
## 3  12.852675 -7.218567       27.829099 0.3681975 POINT (4363770 8187113)
## 4  11.245737 -6.328700       27.587675 0.3625905 POINT (4727856 7842028)
## 5   2.534983 -0.831428        4.890445 0.8138496 POINT (4345348 9809515)
## 6  -2.101257  1.208082       19.117183 0.7088541 POINT (5439719 9184727)
## 7   2.593303 -5.957844       26.050083 0.5844539 POINT (5148899 8521972)
## 8  -2.637110  1.358718       20.515417 0.7107090 POINT (5432038 9032202)
## 9  11.626819 -3.988258       26.760575 0.3890840 POINT (4186466 7350101)
## 10  7.979188 -3.266966       23.811986 0.4519263 POINT (4107171 6821956)
gwr.fixed.output <- as.data.frame(gwr.fixed$SDF)
brazil_cities.sf.fixed <- cbind(brazil_cities_summ.sf, as.matrix(gwr.fixed.output))
summary(brazil_cities.sf.fixed)
##   City_State          GDP_CAPITA     Brazilian_Percentage Foreign_Percentage 
##  Length:5573        Min.   :     0   Min.   :0.0000       Min.   :0.0000000  
##  Class :character   1st Qu.:     0   1st Qu.:0.9993       1st Qu.:0.0000000  
##  Mode  :character   Median : 10474   Median :1.0000       Median :0.0000000  
##                     Mean   : 15663   Mean   :0.9978       Mean   :0.0007581  
##                     3rd Qu.: 21967   3rd Qu.:1.0000       3rd Qu.:0.0006981  
##                     Max.   :314638   Max.   :1.0000       Max.   :0.3772182  
##  Working_Percentage Urban_Percentage Rural_Percentage Production_Area  
##  Min.   :0.0000     Min.   :0.0000   Min.   :0.0000   Min.   :  0.000  
##  1st Qu.:0.2815     1st Qu.:0.4907   1st Qu.:0.1686   1st Qu.:  2.442  
##  Median :0.3957     Median :0.6624   Median :0.3370   Median :  3.860  
##  Mean   :0.3969     Mean   :0.6509   Mean   :0.3473   Mean   :  5.325  
##  3rd Qu.:0.5171     3rd Qu.:0.8303   3rd Qu.:0.5080   3rd Qu.:  6.366  
##  Max.   :0.7099     Max.   :1.0000   Max.   :0.9545   Max.   :106.960  
##   Tourism_Area    IDHM_Educacao     GVA_AGROPEC      GVA_INDUSTRY     
##  Min.   :0.0000   Min.   :0.0000   Min.   :     0   Min.   :       0  
##  1st Qu.:0.0000   1st Qu.:0.4900   1st Qu.:     0   1st Qu.:       0  
##  Median :0.0000   Median :0.5600   Median :  6127   Median :    2434  
##  Mean   :0.4384   Mean   :0.5583   Mean   : 22996   Mean   :  110871  
##  3rd Qu.:1.0000   3rd Qu.:0.6310   3rd Qu.: 28262   3rd Qu.:   16754  
##  Max.   :1.0000   Max.   :0.8250   Max.   :655505   Max.   :15043915  
##   GVA_SERVICES        GVA_PUBLIC           COMP_A             COMP_B        
##  Min.   :       0   Min.   :       0   Min.   :   0.000   Min.   :  0.0000  
##  1st Qu.:       0   1st Qu.:       0   1st Qu.:   0.000   1st Qu.:  0.0000  
##  Median :   12535   Median :   19382   Median :   0.000   Median :  0.0000  
##  Mean   :  269933   Mean   :   78376   Mean   :   9.182   Mean   :  0.8012  
##  3rd Qu.:   56919   3rd Qu.:   48201   3rd Qu.:   0.000   3rd Qu.:  0.0000  
##  Max.   :53213122   Max.   :10664797   Max.   :1948.000   Max.   :139.0000  
##      COMP_C            COMP_D             COMP_E           COMP_F       
##  Min.   :   0.00   Min.   :  0.0000   Min.   :  0.00   Min.   :   0.00  
##  1st Qu.:   0.00   1st Qu.:  0.0000   1st Qu.:  0.00   1st Qu.:   0.00  
##  Median :   0.00   Median :  0.0000   Median :  0.00   Median :   0.00  
##  Mean   :  44.07   Mean   :  0.1968   Mean   :  1.15   Mean   :  24.25  
##  3rd Qu.:   2.00   3rd Qu.:  0.0000   3rd Qu.:  0.00   3rd Qu.:   1.00  
##  Max.   :6025.00   Max.   :143.0000   Max.   :163.00   Max.   :6373.00  
##      COMP_G            COMP_H            COMP_I            COMP_J       
##  Min.   :    0.0   Min.   :   0.00   Min.   :   0.00   Min.   :   0.00  
##  1st Qu.:    0.0   1st Qu.:   0.00   1st Qu.:   0.00   1st Qu.:   0.00  
##  Median :    0.0   Median :   0.00   Median :   0.00   Median :   0.00  
##  Mean   :  177.7   Mean   :  22.74   Mean   :  31.12   Mean   :  11.38  
##  3rd Qu.:   22.0   3rd Qu.:   2.00   3rd Qu.:   2.00   3rd Qu.:   0.00  
##  Max.   :33566.0   Max.   :3873.00   Max.   :6514.00   Max.   :4535.00  
##      COMP_K             COMP_L             COMP_M             COMP_N        
##  Min.   :   0.000   Min.   :   0.000   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:   0.000   1st Qu.:   0.000   1st Qu.:    0.00   1st Qu.:    0.00  
##  Median :   0.000   Median :   0.000   Median :    0.00   Median :    0.00  
##  Mean   :   7.494   Mean   :   8.525   Mean   :   26.47   Mean   :   45.57  
##  3rd Qu.:   0.000   3rd Qu.:   0.000   3rd Qu.:    1.00   3rd Qu.:    1.00  
##  Max.   :3501.000   Max.   :2785.000   Max.   :11925.00   Max.   :17752.00  
##      COMP_O             COMP_P            COMP_Q            COMP_R        
##  Min.   :  0.0000   Min.   :   0.00   Min.   :   0.00   Min.   :   0.000  
##  1st Qu.:  0.0000   1st Qu.:   0.00   1st Qu.:   0.00   1st Qu.:   0.000  
##  Median :  0.0000   Median :   0.00   Median :   0.00   Median :   0.000  
##  Mean   :  0.9952   Mean   :  14.96   Mean   :  17.38   Mean   :   6.385  
##  3rd Qu.:  1.0000   3rd Qu.:   1.00   3rd Qu.:   0.00   3rd Qu.:   0.000  
##  Max.   :120.0000   Max.   :3325.00   Max.   :4642.00   Max.   :1436.000  
##      COMP_S            COMP_U             HOTELS         Pr_Agencies     
##  Min.   :   0.00   Min.   :0.000000   Min.   : 0.0000   Min.   :  0.000  
##  1st Qu.:   0.00   1st Qu.:0.000000   1st Qu.: 0.0000   1st Qu.:  0.000  
##  Median :   0.00   Median :0.000000   Median : 0.0000   Median :  0.000  
##  Mean   :  24.29   Mean   :0.006639   Mean   : 0.2144   Mean   :  1.012  
##  3rd Qu.:   2.00   3rd Qu.:0.000000   3rd Qu.: 0.0000   3rd Qu.:  0.000  
##  Max.   :5327.00   Max.   :8.000000   Max.   :46.0000   Max.   :273.000  
##   Pu_Agencies         Intercept      Brazilian_Percentage.1
##  Min.   :  0.0000   Min.   :-18556   Min.   :-27412        
##  1st Qu.:  0.0000   1st Qu.: -3469   1st Qu.:-24407        
##  Median :  0.0000   Median : -1198   Median :-13465        
##  Mean   :  0.7886   Mean   : -1656   Mean   :-11602        
##  3rd Qu.:  0.0000   3rd Qu.:  3363   3rd Qu.:  2239        
##  Max.   :168.0000   Max.   :  5536   Max.   :  4183        
##  Working_Percentage.1 Tourism_Area.1  IDHM_Educacao.1   GVA_AGROPEC.1    
##  Min.   :-28760.6     Min.   :-1908   Min.   : -955.8   Min.   :0.04547  
##  1st Qu.:-14797.5     1st Qu.: 3879   1st Qu.:11444.0   1st Qu.:0.06727  
##  Median :  -997.1     Median : 4852   Median :52685.6   Median :0.08260  
##  Mean   : -5547.4     Mean   : 4743   Mean   :43274.5   Mean   :0.08813  
##  3rd Qu.:  3014.3     3rd Qu.: 5360   3rd Qu.:69595.3   3rd Qu.:0.09708  
##  Max.   :  6797.9     Max.   : 9486   Max.   :83829.8   Max.   :0.36826  
##     COMP_D.1          COMP_E.1           COMP_O.1         COMP_U.1        
##  Min.   :-4454.9   Min.   :-14207.4   Min.   :-593.1   Min.   :-933662.5  
##  1st Qu.:-1524.4   1st Qu.: -1021.5   1st Qu.: 172.4   1st Qu.: -16688.8  
##  Median : -945.0   Median :  -885.5   Median :1465.8   Median :  -8403.1  
##  Mean   : -953.9   Mean   :  -934.3   Mean   :1132.3   Mean   : -15114.2  
##  3rd Qu.: -727.8   3rd Qu.:  -699.0   3rd Qu.:1828.2   3rd Qu.:   -145.2  
##  Max.   :49994.2   Max.   :  -369.1   Max.   :5630.6   Max.   :   5858.1  
##  GVA_INDUSTRY.1          y               yhat           residual     
##  Min.   :0.01273   Min.   :     0   Min.   :-13884   Min.   :-64005  
##  1st Qu.:0.01353   1st Qu.:     0   1st Qu.:  6092   1st Qu.: -5671  
##  Median :0.01515   Median : 10474   Median : 11826   Median : -1664  
##  Mean   :0.01830   Mean   : 15663   Mean   : 15862   Mean   :  -199  
##  3rd Qu.:0.02220   3rd Qu.: 21967   3rd Qu.: 23775   3rd Qu.:  2295  
##  Max.   :0.25051   Max.   :314638   Max.   :226947   Max.   :257425  
##     CV_Score Stud_residual      Intercept_SE    Brazilian_Percentage_SE
##  Min.   :0   Min.   :-5.5095   Min.   :  6615   Min.   :  6876         
##  1st Qu.:0   1st Qu.:-0.3866   1st Qu.:  7098   1st Qu.:  7672         
##  Median :0   Median :-0.1135   Median :  9297   Median :  9406         
##  Mean   :0   Mean   :-0.0128   Mean   :  9816   Mean   : 10115         
##  3rd Qu.:0   3rd Qu.: 0.1568   3rd Qu.: 11043   3rd Qu.: 11109         
##  Max.   :0   Max.   :17.5090   Max.   :232748   Max.   :232774         
##  Working_Percentage_SE Tourism_Area_SE   IDHM_Educacao_SE GVA_AGROPEC_SE    
##  Min.   : 2381         Min.   :  610.0   Min.   : 4482    Min.   :0.006165  
##  1st Qu.: 2661         1st Qu.:  646.9   1st Qu.: 4868    1st Qu.:0.006752  
##  Median : 3070         Median :  713.7   Median : 5539    Median :0.007455  
##  Mean   : 3434         Mean   :  917.9   Mean   : 6032    Mean   :0.010913  
##  3rd Qu.: 3565         3rd Qu.:  825.4   3rd Qu.: 6505    3rd Qu.:0.009276  
##  Max.   :22958         Max.   :13650.5   Max.   :27161    Max.   :0.234310  
##    COMP_D_SE         COMP_E_SE         COMP_O_SE        COMP_U_SE     
##  Min.   :  151.8   Min.   :  69.70   Min.   : 116.4   Min.   :  2057  
##  1st Qu.:  168.7   1st Qu.:  74.07   1st Qu.: 157.8   1st Qu.:  2275  
##  Median :  219.5   Median :  87.97   Median : 181.7   Median :  3074  
##  Mean   :  456.4   Mean   : 171.10   Mean   : 215.3   Mean   :  6091  
##  3rd Qu.:  473.5   3rd Qu.: 204.09   3rd Qu.: 217.8   3rd Qu.:  3777  
##  Max.   :37482.2   Max.   :9283.67   Max.   :2346.4   Max.   :661768  
##  GVA_INDUSTRY_SE      Intercept_TV      Brazilian_Percentage_TV
##  Min.   :0.0005030   Min.   :-1.60454   Min.   :-3.6730        
##  1st Qu.:0.0005252   1st Qu.:-0.35136   1st Qu.:-3.1320        
##  Median :0.0006063   Median :-0.09964   Median :-1.3389        
##  Mean   :0.0014974   Mean   :-0.10081   Mean   :-1.4713        
##  3rd Qu.:0.0013284   3rd Qu.: 0.49017   3rd Qu.: 0.2010        
##  Max.   :0.1672912   Max.   : 0.77461   Max.   : 0.3153        
##  Working_Percentage_TV Tourism_Area_TV   IDHM_Educacao_TV   GVA_AGROPEC_TV  
##  Min.   :-9.2812       Min.   :-0.1406   Min.   :-0.04887   Min.   : 1.572  
##  1st Qu.:-6.1056       1st Qu.: 4.5437   1st Qu.: 1.57713   1st Qu.: 6.098  
##  Median :-0.2446       Median : 7.0468   Median :10.45565   Median :11.580  
##  Mean   :-2.1700       Mean   : 6.3630   Mean   : 8.10932   Mean   :10.424  
##  3rd Qu.: 0.9186       3rd Qu.: 7.8857   3rd Qu.:13.01190   3rd Qu.:13.607  
##  Max.   : 1.6866       Max.   : 9.3216   Max.   :14.74981   Max.   :17.540  
##    COMP_D_TV        COMP_E_TV         COMP_O_TV         COMP_U_TV        
##  Min.   :-6.750   Min.   :-15.859   Min.   :-3.1084   Min.   :-12.04307  
##  1st Qu.:-5.606   1st Qu.:-13.447   1st Qu.: 0.9371   1st Qu.: -4.88502  
##  Median :-4.552   Median : -9.822   Median : 7.1621   Median : -3.60810  
##  Mean   :-3.959   Mean   : -8.928   Mean   : 5.9183   Mean   : -3.31230  
##  3rd Qu.:-2.585   3rd Qu.: -2.938   3rd Qu.:10.6317   3rd Qu.: -0.02712  
##  Max.   : 2.085   Max.   : -1.428   Max.   :13.3911   Max.   :  1.63595  
##  GVA_INDUSTRY_TV     Local_R2        coords.x1         coords.x2       
##  Min.   : 1.433   Min.   :0.3539   Min.   :1671725   Min.   : 6039171  
##  1st Qu.:18.541   1st Qu.:0.3772   1st Qu.:4124229   1st Qu.: 7405126  
##  Median :23.823   Median :0.4623   Median :4607998   Median : 7966247  
##  Mean   :21.891   Mean   :0.5264   Mean   :4640696   Mean   : 8135780  
##  3rd Qu.:26.537   3rd Qu.:0.6844   3rd Qu.:5177379   3rd Qu.: 9058340  
##  Max.   :29.505   Max.   :0.9917   Max.   :6175358   Max.   :10507274  
##           geometry   
##  POINT        :5573  
##  epsg:5641    :   0  
##  +proj=merc...:   0  
##                      
##                      
## 
summary(gwr.fixed$SDF$yhat)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -13884    6092   11826   15862   23775  226947
mun_brazil_cities.sf.fixed <- st_join(mun_sirgas2000, brazil_cities.sf.fixed) 

9.2 Visualising Local R2

The code chunks below is used to create an interactive polygon symbol map.

tm_shape(mun_brazil_cities.sf.fixed) +  
  tm_polygons(col = "Local_R2",
           #size = 0.15,
           border.col = "gray60",
           popup.vars = c("City_State","Local_R2"),
           border.lwd = 1) 

The maximum computed R2 was 0.9917 while the lowest was 0.3539. This shows that based on what we have done, our calculation is able to account for 99.17% of some municipality whereas some municipality we are only able to account for 35.39% of it. As we can see above, those that we can account for are mostly on the lower GDP municipality whereas for those higher GDP the Local_R2 is much smaller. This means that we could have missed more variables that affects the higher GDP municipality. We are only able to pick variables that results in the lower GDP municipality.

9.3 Visualising Coefficients

The code chunks below is used to create an interactive polygon symbol map for Brazilian_Percentage, Working_Percentage, Tourism_Area, IDHM_Educacao, GVA_AGROPEC, COMP_D, COMP_E, COMP_O, COMP_U and GVA_INDUSTRY.

Brazilian_Percentage <- tm_shape(mun_brazil_cities.sf.fixed) +  
                          tm_polygons(col = "Brazilian_Percentage",
                                   #size = 0.15,
                                   border.col = "gray60",
                                   border.lwd = 1) +
                          tm_view(set.zoom.limits = c(11,14))
Working_Percentage <- tm_shape(mun_brazil_cities.sf.fixed) +  
                          tm_polygons(col = "Working_Percentage",
                                   #size = 0.15,
                                   border.col = "gray60",
                                   border.lwd = 1) +
                          tm_view(set.zoom.limits = c(11,14))
Tourism_Area <- tm_shape(mun_brazil_cities.sf.fixed) +  
                    tm_polygons(col = "Tourism_Area",
                             #size = 0.15,
                             border.col = "gray60",
                             border.lwd = 1) +
                    tm_view(set.zoom.limits = c(11,14))
IDHM_Educacao <- tm_shape(mun_brazil_cities.sf.fixed) +  
                    tm_polygons(col = "IDHM_Educacao",
                             #size = 0.15,
                             border.col = "gray60",
                             border.lwd = 1) +
                    tm_view(set.zoom.limits = c(11,14))
GVA_AGROPEC <- tm_shape(mun_brazil_cities.sf.fixed) +  
                  tm_polygons(col = "GVA_AGROPEC",
                           #size = 0.15,
                           border.col = "gray60",
                           border.lwd = 1) +
                  tm_view(set.zoom.limits = c(11,14))
COMP_D <- tm_shape(mun_brazil_cities.sf.fixed) +  
            tm_polygons(col = "COMP_D",
                     #size = 0.15,
                     border.col = "gray60",
                     border.lwd = 1) +
            tm_view(set.zoom.limits = c(11,14))
COMP_O <- tm_shape(mun_brazil_cities.sf.fixed) +  
            tm_polygons(col = "COMP_O",
                     #size = 0.15,
                     border.col = "gray60",
                     border.lwd = 1) +
            tm_view(set.zoom.limits = c(11,14))
COMP_E <- tm_shape(mun_brazil_cities.sf.fixed) +  
            tm_polygons(col = "COMP_E",
                     #size = 0.15,
                     border.col = "gray60",
                     border.lwd = 1) +
            tm_view(set.zoom.limits = c(11,14))
COMP_U <- tm_shape(mun_brazil_cities.sf.fixed) +  
            tm_polygons(col = "COMP_U",
                     #size = 0.15,
                     border.col = "gray60",
                     border.lwd = 1) +
            tm_view(set.zoom.limits = c(11,14))
GVA_INDUSTRY <- tm_shape(mun_brazil_cities.sf.fixed) +  
            tm_polygons(col = "GVA_INDUSTRY",
                     #size = 0.15,
                     border.col = "gray60",
                     border.lwd = 1) +
            tm_view(set.zoom.limits = c(11,14))
tmap_arrange(Brazilian_Percentage, Working_Percentage, Tourism_Area, IDHM_Educacao, GVA_AGROPEC, COMP_D, COMP_E, COMP_O, COMP_U, GVA_INDUSTRY, ncol=2)

Based on the choropleth maps above, we are able to see that Brazilian_Percentage, Working_Percentage and IDHM_Educacao has the biggest contrast where we are able to see those areas with a higher GDP Per Capita are usually low in the number of Brazilian People which means Foreigners are largely populated around the municipality. For working percentage, those with higher GDP Per Capita usually have more working population within the municipality. Lastly for IDHM_Educacao, we are able to see that those with a higher education index are also in the higher GDP Per Capita municipality. As for the other variables, we are able to see a slight difference from tourism area where those with higher GDP are those with higher tourism area which means tourist places. For the rest of the variables, we can see minor difference between the different municipality. Hence, from this, I am able to conclude that those variables that affect the GDP Per Capita are the number of Brazilian’s, the number of working adults, education level and whether it is a tourist destination within in the municipality. However, as I have noted, there are more variables that could be added to enhance the accuracy of the model. However, due to time constraints, I will be rounding off the report here.