https://rpubs.com/Huiling/take-home_ex04
I will be publishing as static maps.
In this take-home exercise, you are tasked to segment Singapore at the planning subzone level into homogeneous socioeconomic areas by combining geodemographic data extracted from Singapore Department of Statistics and urban functions extracted from the geospatial data provided.
To provide answers to the questions above, the data sets used are:
packages = c('olsrr', 'corrplot', 'ggpubr', 'sf', 'spdep', 'GWmodel', 'tmap', 'tidyverse', 'geobr', 'dplyr', 'rgdal')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
brazil_cities <- read_csv("data/aspatial/BRAZIL_CITIES.csv")
## Parsed with column specification:
## cols(
## .default = col_double(),
## CITY = col_character(),
## STATE = col_character(),
## REGIAO_TUR = col_character(),
## CATEGORIA_TUR = col_character(),
## RURAL_URBAN = col_character(),
## GVA_MAIN = col_character()
## )
## See spec(...) for full column specifications.
I have selected the following variables from the table as listed below:
Dependent Variable:
Independent Variables:
brazil_cities_summarised <- brazil_cities %>%
select(GDP_CAPITA, CITY, STATE, CAPITAL, IBGE_RES_POP, IBGE_RES_POP_BRAS, IBGE_RES_POP_ESTR, `IBGE_15-59`, AREA, IBGE_DU, IBGE_DU_URBAN, IBGE_DU_RURAL, IBGE_PLANTED_AREA, `IBGE_CROP_PRODUCTION_$`, IDHM_Educacao, LONG, LAT, REGIAO_TUR, GVA_AGROPEC, GVA_AGROPEC, GVA_INDUSTRY, GVA_SERVICES, GVA_PUBLIC, COMP_A, COMP_B, COMP_C, COMP_D, COMP_E, COMP_F, COMP_G, COMP_H, COMP_I, COMP_J, COMP_K, COMP_L, COMP_M, COMP_N, COMP_O, COMP_P, COMP_Q, COMP_R, COMP_S, COMP_T, COMP_U, HOTELS, Pr_Agencies, Pu_Agencies)
Now, I will check for variables that have empty values.
summary(brazil_cities_summarised)
## GDP_CAPITA CITY STATE CAPITAL
## Min. : 3191 Length:5573 Length:5573 Min. :0.000000
## 1st Qu.: 9103 Class :character Class :character 1st Qu.:0.000000
## Median : 16129 Mode :character Mode :character Median :0.000000
## Mean : 21306 Mean :0.004845
## 3rd Qu.: 26152 3rd Qu.:0.000000
## Max. :314638 Max. :1.000000
## NA's :1476
## IBGE_RES_POP IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_15-59
## Min. : 805 Min. : 805 Min. : 0.0 Min. : 94
## 1st Qu.: 5235 1st Qu.: 5230 1st Qu.: 0.0 1st Qu.: 1734
## Median : 10934 Median : 10926 Median : 0.0 Median : 3841
## Mean : 34278 Mean : 34200 Mean : 77.5 Mean : 18212
## 3rd Qu.: 23424 3rd Qu.: 23390 3rd Qu.: 10.0 3rd Qu.: 9628
## Max. :11253503 Max. :11133776 Max. :119727.0 Max. :7058221
## NA's :8 NA's :8 NA's :8 NA's :8
## AREA IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL
## Min. : 1.0 Min. : 239 Min. : 60 Min. : 3
## 1st Qu.: 25.0 1st Qu.: 1572 1st Qu.: 874 1st Qu.: 487
## Median :201.8 Median : 3174 Median : 1846 Median : 931
## Mean :266.1 Mean : 10303 Mean : 8859 Mean : 1463
## 3rd Qu.:410.9 3rd Qu.: 6726 3rd Qu.: 4624 3rd Qu.: 1832
## Max. :999.5 Max. :3576148 Max. :3548433 Max. :33809
## NA's :3 NA's :10 NA's :10 NA's :81
## IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION_$ IDHM_Educacao LONG
## Min. : 0.0 Min. : 0 Min. :0.2070 Min. :-72.92
## 1st Qu.: 910.2 1st Qu.: 2326 1st Qu.:0.4900 1st Qu.:-50.87
## Median : 3471.5 Median : 13846 Median :0.5600 Median :-46.52
## Mean : 14179.9 Mean : 57384 Mean :0.5591 Mean :-46.23
## 3rd Qu.: 11194.2 3rd Qu.: 55619 3rd Qu.:0.6310 3rd Qu.:-41.40
## Max. :1205669.0 Max. :3274885 Max. :0.8250 Max. :-32.44
## NA's :3 NA's :3 NA's :8 NA's :9
## LAT REGIAO_TUR GVA_AGROPEC GVA_INDUSTRY
## Min. :-33.688 Length:5573 Min. : 0 Min. : 1
## 1st Qu.:-22.838 Class :character 1st Qu.: 3224 1st Qu.: 1684
## Median :-18.089 Mode :character Median : 15941 Median : 6100
## Mean :-16.444 Mean : 31281 Mean : 150813
## 3rd Qu.: -8.489 3rd Qu.: 39534 3rd Qu.: 35684
## Max. : 4.585 Max. :655505 Max. :15043915
## NA's :9 NA's :1476 NA's :1476
## GVA_SERVICES GVA_PUBLIC COMP_A COMP_B
## Min. : 2 Min. : 7 Min. : 0.00 Min. : 0.000
## 1st Qu.: 9426 1st Qu.: 15970 1st Qu.: 3.00 1st Qu.: 0.000
## Median : 26696 Median : 29879 Median : 7.00 Median : 1.000
## Mean : 367181 Mean : 106612 Mean : 36.14 Mean : 3.153
## 3rd Qu.: 98873 3rd Qu.: 66222 3rd Qu.: 22.00 3rd Qu.: 4.000
## Max. :53213122 Max. :10664797 Max. :1948.00 Max. :139.000
## NA's :1476 NA's :1476 NA's :4157 NA's :4157
## COMP_C COMP_D COMP_E COMP_F
## Min. : 0.0 Min. : 0.000 Min. : 0.000 Min. : 0.00
## 1st Qu.: 25.0 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 8.00
## Median : 58.0 Median : 0.000 Median : 1.000 Median : 20.50
## Mean : 173.4 Mean : 0.775 Mean : 4.525 Mean : 95.45
## 3rd Qu.: 151.0 3rd Qu.: 0.000 3rd Qu.: 4.000 3rd Qu.: 61.00
## Max. :6025.0 Max. :143.000 Max. :163.000 Max. :6373.00
## NA's :4157 NA's :4157 NA's :4157 NA's :4157
## COMP_G COMP_H COMP_I COMP_J
## Min. : 4.0 Min. : 0.00 Min. : 0.0 Min. : 0.00
## 1st Qu.: 101.0 1st Qu.: 15.00 1st Qu.: 14.0 1st Qu.: 2.00
## Median : 228.0 Median : 34.00 Median : 32.0 Median : 6.00
## Mean : 699.5 Mean : 89.51 Mean : 122.5 Mean : 44.79
## 3rd Qu.: 575.8 3rd Qu.: 85.00 3rd Qu.: 93.0 3rd Qu.: 21.00
## Max. :33566.0 Max. :3873.00 Max. :6514.0 Max. :4535.00
## NA's :4157 NA's :4157 NA's :4157 NA's :4157
## COMP_K COMP_L COMP_M COMP_N
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.: 1.00 1st Qu.: 1.00 1st Qu.: 7.0 1st Qu.: 8.0
## Median : 4.00 Median : 4.50 Median : 17.0 Median : 20.0
## Mean : 29.49 Mean : 33.55 Mean : 104.2 Mean : 179.4
## 3rd Qu.: 12.00 3rd Qu.: 18.00 3rd Qu.: 50.0 3rd Qu.: 73.0
## Max. :3501.00 Max. :2785.00 Max. :11925.0 Max. :17752.0
## NA's :4157 NA's :4157 NA's :4157 NA's :4157
## COMP_O COMP_P COMP_Q COMP_R
## Min. : 1.000 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 2.000 1st Qu.: 6.00 1st Qu.: 5.00 1st Qu.: 3.00
## Median : 3.000 Median : 16.00 Median : 15.00 Median : 7.00
## Mean : 3.917 Mean : 58.87 Mean : 68.42 Mean : 25.13
## 3rd Qu.: 4.000 3rd Qu.: 41.25 3rd Qu.: 42.00 3rd Qu.: 20.00
## Max. :120.000 Max. :3325.00 Max. :4642.00 Max. :1436.00
## NA's :4157 NA's :4157 NA's :4157 NA's :4157
## COMP_S COMP_T COMP_U HOTELS
## Min. : 0.00 Min. :0 Min. :0.000 Min. : 1.000
## 1st Qu.: 13.00 1st Qu.:0 1st Qu.:0.000 1st Qu.: 1.000
## Median : 31.00 Median :0 Median :0.000 Median : 1.000
## Mean : 95.61 Mean :0 Mean :0.026 Mean : 3.546
## 3rd Qu.: 68.00 3rd Qu.:0 3rd Qu.:0.000 3rd Qu.: 3.000
## Max. :5327.00 Max. :0 Max. :8.000 Max. :46.000
## NA's :4157 NA's :4157 NA's :4157 NA's :5236
## Pr_Agencies Pu_Agencies
## Min. : 0.000 Min. : 0.00
## 1st Qu.: 1.000 1st Qu.: 1.00
## Median : 2.000 Median : 2.00
## Mean : 4.425 Mean : 3.45
## 3rd Qu.: 3.000 3rd Qu.: 3.00
## Max. :273.000 Max. :168.00
## NA's :4299 NA's :4299
From the summary above, it can be seen that GDP_CAPITA, IBGE_RES_POP, IBGE_RES_POP_BRAS, IBGE_RES_POP_ESTR, IBGE_15-19, AREA, IBGE_DU, IBGE_DU_URBAN, IBGE_DU_RURAL, IBGE_PLANTED_AREA, IBGE_CROP_PRODUCTION_$, IDHM_Educacao, LONG, LAT, GVA_AGROPEC, GVA_INDUSTRY, GVA_SERVICES, GVA_PUBLIC, COMP_A, COMP_B, COMP_C, COMP_D, COMP_E, COMP_F, COMP_G, COMP_H, COMP_I, COMP_J, COMP_K, COMP_L, COMP_M, COMP_N, COMP_O, COMP_P, COMP_Q, COMP_R, COMP_S, COMP_T, COMP_U, HOTELS, Pr_Agencies, Pu_Agencies have multiple NA’s. Hence, there is a need to substitute to 0 or the actual value.
For the variables LONG and LAT, I will be checking https://en.db-city.com/ to reference the LONG and LAT since those cannot be changed to 0 or left as NA. I will be changing it within the excel file. I have duplicates the excel file, and replaced the LONG and LAT variables that are missing.
brazil_cities <- read_csv("data/aspatial/BRAZIL_CITIES_2.csv")
## Parsed with column specification:
## cols(
## .default = col_double(),
## CITY = col_character(),
## STATE = col_character(),
## REGIAO_TUR = col_character(),
## CATEGORIA_TUR = col_character(),
## RURAL_URBAN = col_character(),
## GVA_MAIN = col_character()
## )
## See spec(...) for full column specifications.
brazil_cities_summarised <- brazil_cities %>%
select(GDP_CAPITA, CITY, STATE, CAPITAL, IBGE_RES_POP, IBGE_RES_POP_BRAS, IBGE_RES_POP_ESTR, `IBGE_15-59`, AREA, IBGE_DU, IBGE_DU_URBAN, IBGE_DU_RURAL, IBGE_PLANTED_AREA, `IBGE_CROP_PRODUCTION_$`, IDHM_Educacao, LONG, LAT, REGIAO_TUR, GVA_AGROPEC, GVA_AGROPEC, GVA_INDUSTRY, GVA_SERVICES, GVA_PUBLIC, COMP_A, COMP_B, COMP_C, COMP_D, COMP_E, COMP_F, COMP_G, COMP_H, COMP_I, COMP_J, COMP_K, COMP_L, COMP_M, COMP_N, COMP_O, COMP_P, COMP_Q, COMP_R, COMP_S, COMP_T, COMP_U, HOTELS, Pr_Agencies, Pu_Agencies)
Now we will check whether there are any empty values in the LONG and LAT variables.
summary(brazil_cities_summarised)
## GDP_CAPITA CITY STATE CAPITAL
## Min. : 3191 Length:5573 Length:5573 Min. :0.000000
## 1st Qu.: 9103 Class :character Class :character 1st Qu.:0.000000
## Median : 16129 Mode :character Mode :character Median :0.000000
## Mean : 21306 Mean :0.004845
## 3rd Qu.: 26152 3rd Qu.:0.000000
## Max. :314638 Max. :1.000000
## NA's :1476
## IBGE_RES_POP IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_15-59
## Min. : 805 Min. : 805 Min. : 0.0 Min. : 94
## 1st Qu.: 5235 1st Qu.: 5230 1st Qu.: 0.0 1st Qu.: 1734
## Median : 10934 Median : 10926 Median : 0.0 Median : 3841
## Mean : 34278 Mean : 34200 Mean : 77.5 Mean : 18212
## 3rd Qu.: 23424 3rd Qu.: 23390 3rd Qu.: 10.0 3rd Qu.: 9628
## Max. :11253503 Max. :11133776 Max. :119727.0 Max. :7058221
## NA's :8 NA's :8 NA's :8 NA's :8
## AREA IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL
## Min. : 1.0 Min. : 239 Min. : 60 Min. : 3
## 1st Qu.: 25.0 1st Qu.: 1572 1st Qu.: 874 1st Qu.: 487
## Median :201.8 Median : 3174 Median : 1846 Median : 931
## Mean :266.1 Mean : 10303 Mean : 8859 Mean : 1463
## 3rd Qu.:410.9 3rd Qu.: 6726 3rd Qu.: 4624 3rd Qu.: 1832
## Max. :999.5 Max. :3576148 Max. :3548433 Max. :33809
## NA's :3 NA's :10 NA's :10 NA's :81
## IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION_$ IDHM_Educacao LONG
## Min. : 0.0 Min. : 0 Min. :0.2070 Min. :-72.92
## 1st Qu.: 910.2 1st Qu.: 2326 1st Qu.:0.4900 1st Qu.:-50.87
## Median : 3471.5 Median : 13846 Median :0.5600 Median :-46.52
## Mean : 14179.9 Mean : 57384 Mean :0.5591 Mean :-46.23
## 3rd Qu.: 11194.2 3rd Qu.: 55619 3rd Qu.:0.6310 3rd Qu.:-41.41
## Max. :1205669.0 Max. :3274885 Max. :0.8250 Max. :-32.44
## NA's :3 NA's :3 NA's :8
## LAT REGIAO_TUR GVA_AGROPEC GVA_INDUSTRY
## Min. :-33.688 Length:5573 Min. : 0 Min. : 1
## 1st Qu.:-22.843 Class :character 1st Qu.: 3224 1st Qu.: 1684
## Median :-18.091 Mode :character Median : 15941 Median : 6100
## Mean :-16.451 Mean : 31281 Mean : 150813
## 3rd Qu.: -8.490 3rd Qu.: 39534 3rd Qu.: 35684
## Max. : 4.585 Max. :655505 Max. :15043915
## NA's :1476 NA's :1476
## GVA_SERVICES GVA_PUBLIC COMP_A COMP_B
## Min. : 2 Min. : 7 Min. : 0.00 Min. : 0.000
## 1st Qu.: 9426 1st Qu.: 15970 1st Qu.: 3.00 1st Qu.: 0.000
## Median : 26696 Median : 29879 Median : 7.00 Median : 1.000
## Mean : 367181 Mean : 106612 Mean : 36.14 Mean : 3.153
## 3rd Qu.: 98873 3rd Qu.: 66222 3rd Qu.: 22.00 3rd Qu.: 4.000
## Max. :53213122 Max. :10664797 Max. :1948.00 Max. :139.000
## NA's :1476 NA's :1476 NA's :4157 NA's :4157
## COMP_C COMP_D COMP_E COMP_F
## Min. : 0.0 Min. : 0.000 Min. : 0.000 Min. : 0.00
## 1st Qu.: 25.0 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 8.00
## Median : 58.0 Median : 0.000 Median : 1.000 Median : 20.50
## Mean : 173.4 Mean : 0.775 Mean : 4.525 Mean : 95.45
## 3rd Qu.: 151.0 3rd Qu.: 0.000 3rd Qu.: 4.000 3rd Qu.: 61.00
## Max. :6025.0 Max. :143.000 Max. :163.000 Max. :6373.00
## NA's :4157 NA's :4157 NA's :4157 NA's :4157
## COMP_G COMP_H COMP_I COMP_J
## Min. : 4.0 Min. : 0.00 Min. : 0.0 Min. : 0.00
## 1st Qu.: 101.0 1st Qu.: 15.00 1st Qu.: 14.0 1st Qu.: 2.00
## Median : 228.0 Median : 34.00 Median : 32.0 Median : 6.00
## Mean : 699.5 Mean : 89.51 Mean : 122.5 Mean : 44.79
## 3rd Qu.: 575.8 3rd Qu.: 85.00 3rd Qu.: 93.0 3rd Qu.: 21.00
## Max. :33566.0 Max. :3873.00 Max. :6514.0 Max. :4535.00
## NA's :4157 NA's :4157 NA's :4157 NA's :4157
## COMP_K COMP_L COMP_M COMP_N
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.: 1.00 1st Qu.: 1.00 1st Qu.: 7.0 1st Qu.: 8.0
## Median : 4.00 Median : 4.50 Median : 17.0 Median : 20.0
## Mean : 29.49 Mean : 33.55 Mean : 104.2 Mean : 179.4
## 3rd Qu.: 12.00 3rd Qu.: 18.00 3rd Qu.: 50.0 3rd Qu.: 73.0
## Max. :3501.00 Max. :2785.00 Max. :11925.0 Max. :17752.0
## NA's :4157 NA's :4157 NA's :4157 NA's :4157
## COMP_O COMP_P COMP_Q COMP_R
## Min. : 1.000 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 2.000 1st Qu.: 6.00 1st Qu.: 5.00 1st Qu.: 3.00
## Median : 3.000 Median : 16.00 Median : 15.00 Median : 7.00
## Mean : 3.917 Mean : 58.87 Mean : 68.42 Mean : 25.13
## 3rd Qu.: 4.000 3rd Qu.: 41.25 3rd Qu.: 42.00 3rd Qu.: 20.00
## Max. :120.000 Max. :3325.00 Max. :4642.00 Max. :1436.00
## NA's :4157 NA's :4157 NA's :4157 NA's :4157
## COMP_S COMP_T COMP_U HOTELS
## Min. : 0.00 Min. :0 Min. :0.000 Min. : 1.000
## 1st Qu.: 13.00 1st Qu.:0 1st Qu.:0.000 1st Qu.: 1.000
## Median : 31.00 Median :0 Median :0.000 Median : 1.000
## Mean : 95.61 Mean :0 Mean :0.026 Mean : 3.546
## 3rd Qu.: 68.00 3rd Qu.:0 3rd Qu.:0.000 3rd Qu.: 3.000
## Max. :5327.00 Max. :0 Max. :8.000 Max. :46.000
## NA's :4157 NA's :4157 NA's :4157 NA's :5236
## Pr_Agencies Pu_Agencies
## Min. : 0.000 Min. : 0.00
## 1st Qu.: 1.000 1st Qu.: 1.00
## Median : 2.000 Median : 2.00
## Mean : 4.425 Mean : 3.45
## 3rd Qu.: 3.000 3rd Qu.: 3.00
## Max. :273.000 Max. :168.00
## NA's :4299 NA's :4299
From the results above, it can be seen that the LAT and LONG variables are filled. Now I will be replacing all the NA variables to 0. I will be using the following code chunk in order to replace all the values for the variables.
brazil_cities_summarised <- brazil_cities_summarised %>%
mutate_if(is.numeric, ~replace(., is.na(.), 0))
Now I will check whether there are anymore missing variables.
summary(brazil_cities_summarised)
## GDP_CAPITA CITY STATE CAPITAL
## Min. : 0 Length:5573 Length:5573 Min. :0.000000
## 1st Qu.: 0 Class :character Class :character 1st Qu.:0.000000
## Median : 10474 Mode :character Mode :character Median :0.000000
## Mean : 15663 Mean :0.004845
## 3rd Qu.: 21967 3rd Qu.:0.000000
## Max. :314638 Max. :1.000000
## IBGE_RES_POP IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_15-59
## Min. : 0 Min. : 0 Min. : 0.00 Min. : 0
## 1st Qu.: 5217 1st Qu.: 5217 1st Qu.: 0.00 1st Qu.: 1728
## Median : 10927 Median : 10916 Median : 0.00 Median : 3835
## Mean : 34229 Mean : 34151 Mean : 77.39 Mean : 18186
## 3rd Qu.: 23397 3rd Qu.: 23380 3rd Qu.: 10.00 3rd Qu.: 9591
## Max. :11253503 Max. :11133776 Max. :119727.00 Max. :7058221
## AREA IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL
## Min. : 0.0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 25.0 1st Qu.: 1566 1st Qu.: 870 1st Qu.: 470
## Median :201.5 Median : 3169 Median : 1839 Median : 916
## Mean :265.9 Mean : 10284 Mean : 8843 Mean : 1441
## 3rd Qu.:410.9 3rd Qu.: 6718 3rd Qu.: 4615 3rd Qu.: 1812
## Max. :999.5 Max. :3576148 Max. :3548433 Max. :33809
## IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION_$ IDHM_Educacao LONG
## Min. : 0 Min. : 0 Min. :0.0000 Min. :-72.92
## 1st Qu.: 908 1st Qu.: 2322 1st Qu.:0.4900 1st Qu.:-50.87
## Median : 3464 Median : 13832 Median :0.5600 Median :-46.52
## Mean : 14172 Mean : 57353 Mean :0.5583 Mean :-46.23
## 3rd Qu.: 11174 3rd Qu.: 55608 3rd Qu.:0.6310 3rd Qu.:-41.41
## Max. :1205669 Max. :3274885 Max. :0.8250 Max. :-32.44
## LAT REGIAO_TUR GVA_AGROPEC GVA_INDUSTRY
## Min. :-33.688 Length:5573 Min. : 0 Min. : 0
## 1st Qu.:-22.843 Class :character 1st Qu.: 0 1st Qu.: 0
## Median :-18.091 Mode :character Median : 6127 Median : 2434
## Mean :-16.451 Mean : 22996 Mean : 110871
## 3rd Qu.: -8.490 3rd Qu.: 28262 3rd Qu.: 16754
## Max. : 4.585 Max. :655505 Max. :15043915
## GVA_SERVICES GVA_PUBLIC COMP_A COMP_B
## Min. : 0 Min. : 0 Min. : 0.000 Min. : 0.0000
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0.000 1st Qu.: 0.0000
## Median : 12535 Median : 19382 Median : 0.000 Median : 0.0000
## Mean : 269933 Mean : 78376 Mean : 9.182 Mean : 0.8012
## 3rd Qu.: 56919 3rd Qu.: 48201 3rd Qu.: 0.000 3rd Qu.: 0.0000
## Max. :53213122 Max. :10664797 Max. :1948.000 Max. :139.0000
## COMP_C COMP_D COMP_E COMP_F
## Min. : 0.00 Min. : 0.0000 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.0000 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 0.00 Median : 0.0000 Median : 0.00 Median : 0.00
## Mean : 44.07 Mean : 0.1968 Mean : 1.15 Mean : 24.25
## 3rd Qu.: 2.00 3rd Qu.: 0.0000 3rd Qu.: 0.00 3rd Qu.: 1.00
## Max. :6025.00 Max. :143.0000 Max. :163.00 Max. :6373.00
## COMP_G COMP_H COMP_I COMP_J
## Min. : 0.0 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.0 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 0.0 Median : 0.00 Median : 0.00 Median : 0.00
## Mean : 177.7 Mean : 22.74 Mean : 31.12 Mean : 11.38
## 3rd Qu.: 22.0 3rd Qu.: 2.00 3rd Qu.: 2.00 3rd Qu.: 0.00
## Max. :33566.0 Max. :3873.00 Max. :6514.00 Max. :4535.00
## COMP_K COMP_L COMP_M COMP_N
## Min. : 0.000 Min. : 0.000 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 0.000 Median : 0.000 Median : 0.00 Median : 0.00
## Mean : 7.494 Mean : 8.525 Mean : 26.47 Mean : 45.57
## 3rd Qu.: 0.000 3rd Qu.: 0.000 3rd Qu.: 1.00 3rd Qu.: 1.00
## Max. :3501.000 Max. :2785.000 Max. :11925.00 Max. :17752.00
## COMP_O COMP_P COMP_Q COMP_R
## Min. : 0.0000 Min. : 0.00 Min. : 0.00 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.000
## Median : 0.0000 Median : 0.00 Median : 0.00 Median : 0.000
## Mean : 0.9952 Mean : 14.96 Mean : 17.38 Mean : 6.385
## 3rd Qu.: 1.0000 3rd Qu.: 1.00 3rd Qu.: 0.00 3rd Qu.: 0.000
## Max. :120.0000 Max. :3325.00 Max. :4642.00 Max. :1436.000
## COMP_S COMP_T COMP_U HOTELS
## Min. : 0.00 Min. :0 Min. :0.000000 Min. : 0.0000
## 1st Qu.: 0.00 1st Qu.:0 1st Qu.:0.000000 1st Qu.: 0.0000
## Median : 0.00 Median :0 Median :0.000000 Median : 0.0000
## Mean : 24.29 Mean :0 Mean :0.006639 Mean : 0.2144
## 3rd Qu.: 2.00 3rd Qu.:0 3rd Qu.:0.000000 3rd Qu.: 0.0000
## Max. :5327.00 Max. :0 Max. :8.000000 Max. :46.0000
## Pr_Agencies Pu_Agencies
## Min. : 0.000 Min. : 0.0000
## 1st Qu.: 0.000 1st Qu.: 0.0000
## Median : 0.000 Median : 0.0000
## Mean : 1.012 Mean : 0.7886
## 3rd Qu.: 0.000 3rd Qu.: 0.0000
## Max. :273.000 Max. :168.0000
Now, I will be formatting the brazil_cities_summarised data in order to suit business needs based on what I have listed above. From the result above, it can be seen that COMP_T is totally 0, hence I will be removing it while summarising.
brazil_cities_summ <- brazil_cities_summarised %>%
mutate(`City_State` = paste(CITY, STATE, sep=" - ")) %>%
mutate(`Brazilian_Percentage` = IBGE_RES_POP_BRAS/IBGE_RES_POP) %>%
mutate(`Foreign_Percentage` = IBGE_RES_POP_ESTR/IBGE_RES_POP) %>%
mutate(`Working_Percentage` = `IBGE_15-59`/IBGE_RES_POP) %>%
mutate(`Urban_Percentage` = IBGE_DU_URBAN/IBGE_DU) %>%
mutate(`Rural_Percentage` = IBGE_DU_RURAL/IBGE_DU) %>%
mutate(`Production_Area` = `IBGE_CROP_PRODUCTION_$`/IBGE_PLANTED_AREA) %>%
mutate(`Tourism_Area` = ifelse(is.na(brazil_cities_summarised$REGIAO_TUR), 0, 1)) %>%
select(City_State, LONG, LAT, GDP_CAPITA, Brazilian_Percentage, Foreign_Percentage, Working_Percentage, Urban_Percentage, Rural_Percentage, Production_Area, Tourism_Area, IDHM_Educacao, GVA_AGROPEC, GVA_INDUSTRY, GVA_SERVICES, GVA_PUBLIC, COMP_A, COMP_B, COMP_C, COMP_D, COMP_E, COMP_F, COMP_G, COMP_H, COMP_I, COMP_J, COMP_K, COMP_L, COMP_M, COMP_N, COMP_O, COMP_P, COMP_Q, COMP_R, COMP_S, COMP_U, HOTELS, Pr_Agencies, Pu_Agencies)
Now I will check whether there are anymore missing variables.
summary(brazil_cities_summ)
## City_State LONG LAT GDP_CAPITA
## Length:5573 Min. :-72.92 Min. :-33.688 Min. : 0
## Class :character 1st Qu.:-50.87 1st Qu.:-22.843 1st Qu.: 0
## Mode :character Median :-46.52 Median :-18.091 Median : 10474
## Mean :-46.23 Mean :-16.451 Mean : 15663
## 3rd Qu.:-41.41 3rd Qu.: -8.490 3rd Qu.: 21967
## Max. :-32.44 Max. : 4.585 Max. :314638
##
## Brazilian_Percentage Foreign_Percentage Working_Percentage Urban_Percentage
## Min. :0.6228 Min. :0.000000 Min. :0.02558 Min. :0.04553
## 1st Qu.:0.9993 1st Qu.:0.000000 1st Qu.:0.28244 1st Qu.:0.49157
## Median :1.0000 Median :0.000000 Median :0.39596 Median :0.66277
## Mean :0.9992 Mean :0.000759 Mean :0.39751 Mean :0.65212
## 3rd Qu.:1.0000 3rd Qu.:0.000699 3rd Qu.:0.51724 3rd Qu.:0.83043
## Max. :1.0000 Max. :0.377218 Max. :0.70989 Max. :1.00000
## NA's :8 NA's :8 NA's :8 NA's :10
## Rural_Percentage Production_Area Tourism_Area IDHM_Educacao
## Min. :0.0000 Min. : 0.000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.1696 1st Qu.: 2.507 1st Qu.:0.0000 1st Qu.:0.4900
## Median :0.3372 Median : 3.906 Median :0.0000 Median :0.5600
## Mean :0.3479 Mean : 5.393 Mean :0.4384 Mean :0.5583
## 3rd Qu.:0.5084 3rd Qu.: 6.425 3rd Qu.:1.0000 3rd Qu.:0.6310
## Max. :0.9545 Max. :106.960 Max. :1.0000 Max. :0.8250
## NA's :10 NA's :70
## GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES GVA_PUBLIC
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0
## Median : 6127 Median : 2434 Median : 12535 Median : 19382
## Mean : 22996 Mean : 110871 Mean : 269933 Mean : 78376
## 3rd Qu.: 28262 3rd Qu.: 16754 3rd Qu.: 56919 3rd Qu.: 48201
## Max. :655505 Max. :15043915 Max. :53213122 Max. :10664797
##
## COMP_A COMP_B COMP_C COMP_D
## Min. : 0.000 Min. : 0.0000 Min. : 0.00 Min. : 0.0000
## 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.00 1st Qu.: 0.0000
## Median : 0.000 Median : 0.0000 Median : 0.00 Median : 0.0000
## Mean : 9.182 Mean : 0.8012 Mean : 44.07 Mean : 0.1968
## 3rd Qu.: 0.000 3rd Qu.: 0.0000 3rd Qu.: 2.00 3rd Qu.: 0.0000
## Max. :1948.000 Max. :139.0000 Max. :6025.00 Max. :143.0000
##
## COMP_E COMP_F COMP_G COMP_H
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.: 0.00
## Median : 0.00 Median : 0.00 Median : 0.0 Median : 0.00
## Mean : 1.15 Mean : 24.25 Mean : 177.7 Mean : 22.74
## 3rd Qu.: 0.00 3rd Qu.: 1.00 3rd Qu.: 22.0 3rd Qu.: 2.00
## Max. :163.00 Max. :6373.00 Max. :33566.0 Max. :3873.00
##
## COMP_I COMP_J COMP_K COMP_L
## Min. : 0.00 Min. : 0.00 Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.000
## Median : 0.00 Median : 0.00 Median : 0.000 Median : 0.000
## Mean : 31.12 Mean : 11.38 Mean : 7.494 Mean : 8.525
## 3rd Qu.: 2.00 3rd Qu.: 0.00 3rd Qu.: 0.000 3rd Qu.: 0.000
## Max. :6514.00 Max. :4535.00 Max. :3501.000 Max. :2785.000
##
## COMP_M COMP_N COMP_O COMP_P
## Min. : 0.00 Min. : 0.00 Min. : 0.0000 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.0000 1st Qu.: 0.00
## Median : 0.00 Median : 0.00 Median : 0.0000 Median : 0.00
## Mean : 26.47 Mean : 45.57 Mean : 0.9952 Mean : 14.96
## 3rd Qu.: 1.00 3rd Qu.: 1.00 3rd Qu.: 1.0000 3rd Qu.: 1.00
## Max. :11925.00 Max. :17752.00 Max. :120.0000 Max. :3325.00
##
## COMP_Q COMP_R COMP_S COMP_U
## Min. : 0.00 Min. : 0.000 Min. : 0.00 Min. :0.000000
## 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.:0.000000
## Median : 0.00 Median : 0.000 Median : 0.00 Median :0.000000
## Mean : 17.38 Mean : 6.385 Mean : 24.29 Mean :0.006639
## 3rd Qu.: 0.00 3rd Qu.: 0.000 3rd Qu.: 2.00 3rd Qu.:0.000000
## Max. :4642.00 Max. :1436.000 Max. :5327.00 Max. :8.000000
##
## HOTELS Pr_Agencies Pu_Agencies
## Min. : 0.0000 Min. : 0.000 Min. : 0.0000
## 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0000
## Median : 0.0000 Median : 0.000 Median : 0.0000
## Mean : 0.2144 Mean : 1.012 Mean : 0.7886
## 3rd Qu.: 0.0000 3rd Qu.: 0.000 3rd Qu.: 0.0000
## Max. :46.0000 Max. :273.000 Max. :168.0000
##
From the summary above, there are values that are NA. Hence, I will be using the following code chunk in order to replace all the values for the variables.
brazil_cities_summ <- brazil_cities_summ %>%
mutate_if(is.numeric, ~replace(., is.na(.), 0))
Currently, the brazil_cities_summ data frame is aspatial. We will convert it to a sf object. The code chunk below converts brazil_cities_summ data frame into a simple feature data frame by using st_as_sf() of sf packages.
brazil_cities_summ.sf <- st_as_sf(brazil_cities_summ,
coords = c("LONG", "LAT"),
crs= 4676) %>%
st_transform(crs=5641)
st_crs(brazil_cities_summ.sf)
## Coordinate Reference System:
## User input: EPSG:5641
## wkt:
## PROJCS["SIRGAS 2000 / Brazil Mercator",
## GEOGCS["SIRGAS 2000",
## DATUM["Sistema_de_Referencia_Geocentrico_para_las_AmericaS_2000",
## SPHEROID["GRS 1980",6378137,298.257222101,
## AUTHORITY["EPSG","7019"]],
## TOWGS84[0,0,0,0,0,0,0],
## AUTHORITY["EPSG","6674"]],
## PRIMEM["Greenwich",0,
## AUTHORITY["EPSG","8901"]],
## UNIT["degree",0.0174532925199433,
## AUTHORITY["EPSG","9122"]],
## AUTHORITY["EPSG","4674"]],
## PROJECTION["Mercator_2SP"],
## PARAMETER["standard_parallel_1",-2],
## PARAMETER["central_meridian",-43],
## PARAMETER["false_easting",5000000],
## PARAMETER["false_northing",10000000],
## UNIT["metre",1,
## AUTHORITY["EPSG","9001"]],
## AXIS["X",EAST],
## AXIS["Y",NORTH],
## AUTHORITY["EPSG","5641"]]
The geospatial data used in this will be gotten through read_municipality. Polygon features are used to represent these geographic boundaries. The GIS data is in SIRGAS 2000 projected coordinates systems.
The code chunk below is used to import Brazil’s geospatial data by using by using read_municipality() of geobr packages.
#mun <- read_municipality(code_muni="all", year=2016)
mun <- readOGR(dsn = "data/geospatial", layer = "muni_sf")
## OGR data source with driver: ESRI Shapefile
## Source: "C:\IS415-Geospatial Analytics and Applications\Take Home Exercise\IS415_Take-home_Ex04\data\geospatial", layer: "muni_sf"
## with 5572 features
## It has 4 fields
The code chunk below updates the newly imported mun with the correct ESPG code (i.e. 5641)
mun_sirgas2000 <- st_as_sf(mun, 5641) %>%
st_transform(crs=5641)
After transforming the projection metadata, you can varify the projection of the newly transformed mun_sirgas2000 by using st_crs() of sf package.
The code chunk below will be used to varify the newly transformed mun_sirgas2000
st_crs(mun_sirgas2000)
## Coordinate Reference System:
## User input: EPSG:5641
## wkt:
## PROJCS["SIRGAS 2000 / Brazil Mercator",
## GEOGCS["SIRGAS 2000",
## DATUM["Sistema_de_Referencia_Geocentrico_para_las_AmericaS_2000",
## SPHEROID["GRS 1980",6378137,298.257222101,
## AUTHORITY["EPSG","7019"]],
## TOWGS84[0,0,0,0,0,0,0],
## AUTHORITY["EPSG","6674"]],
## PRIMEM["Greenwich",0,
## AUTHORITY["EPSG","8901"]],
## UNIT["degree",0.0174532925199433,
## AUTHORITY["EPSG","9122"]],
## AUTHORITY["EPSG","4674"]],
## PROJECTION["Mercator_2SP"],
## PARAMETER["standard_parallel_1",-2],
## PARAMETER["central_meridian",-43],
## PARAMETER["false_easting",5000000],
## PARAMETER["false_northing",10000000],
## UNIT["metre",1,
## AUTHORITY["EPSG","9001"]],
## AXIS["X",EAST],
## AXIS["Y",NORTH],
## AUTHORITY["EPSG","5641"]]
Next, you will reveal the extent of mun_sirgas2000 by using st_bbox() of sf package.
st_bbox(mun_sirgas2000)
## xmin ymin xmax ymax
## 1552246 6030702 6575781 10583412
In order to plot a choropleth map, I will be combining both data sets together to present the values.
mun_brazil_city <- st_join(mun_sirgas2000, brazil_cities_summ.sf)
Lastly, we want to reveal the geospatial distribution GDP Per Capita in Brazil. The map will be prepared by using tmap package.
Next, the code chunks below is used to create an interactive point symbol map.
tm_shape(mun_brazil_city) +
tm_polygons(col = "GDP_CAPITA",
alpha = 0.6,
style="quantile",
popup.vars = c("City_State","GDP_CAPITA"))
From the results, we are able to see that there is a uneven distribution of GDP Per Capita within Brazil.
In this section, I will be building hedonic pricing models for GDP Per Capital using lm() of R base.
Before building a multiple regression model, it is important to ensure that the indepdent variables used are not highly correlated to each other. If these highly correlated independent variables are used in building a regression model by mistake, the quality of the model will be compromised. T
Correlation matrix is commonly used to visualise the relationships between the independent variables. Beside the pairs() of R, there are many packages support the display of a correlation matrix. In this section, the corrplot package will be used.
The code chunk below is used to plot a scatterplot matrix of the relationship between the independent variables in brazil_cities_summ data.frame. I will be excluding the varaibles that are higher than 0.8. I have used the “AOE” as order to group the blues and the red in each corner and I will be using “number” as method so that I can specifically identify which are bigger than 0.8.
corrplot(cor(brazil_cities_summ[,5:39]), diag = FALSE, order = "AOE",
tl.pos = "td", tl.cex = 0.5, method = "number", type = "upper")
From the scatterplot matrix, Urban_Percentage and Working Percentage, Comp_C and Comp_E, GVA_Public and Comp_G, Comp_G and Comp_I, Comp_I and GVA_SERVICES, GVA_SERVICES and Pu_Agencies, Pu_Agencies and Comp_P, Comp_P and COMP_S, COMP_S and COMP_R, COMP_R and COMP_F, COMP_F and Pr_Agencies, Pr_Agencies and COMP_Q, COMP_Q and COMP_L, COMP_L and COMP_J, COMP_J and COMP_N, COMP_N and COMP_M, COMP_M and COMP_K is highly correlated with each other. In view of this, it is wiser to only include either one of them in the subsequent model building. As a result, Urban_Percentage, Comp_C, Comp_G, GVA_SERVICES, Comp_P, COMP_R, Pr_Agencies, COMP_L, COMP_N and COMP_M is excluded in the subsequent model building. Due to the result above not being visible. I will be replotting the corplot once again to view if there are any variables to be taken out after removing the 10 variables.
brazil_cities_summ <- brazil_cities_summ %>%
select(City_State, LONG, LAT, GDP_CAPITA, Brazilian_Percentage, Foreign_Percentage, Working_Percentage, Rural_Percentage, Production_Area, Tourism_Area, IDHM_Educacao, GVA_AGROPEC, GVA_INDUSTRY, GVA_PUBLIC, COMP_A, COMP_B, COMP_D, COMP_E, COMP_F, COMP_H, COMP_I, COMP_J, COMP_K, COMP_O, COMP_Q, COMP_S, COMP_U, HOTELS, Pu_Agencies)
I will now replot the corrplot once again.
corrplot(cor(brazil_cities_summ[,5:29]), diag = FALSE, order = "AOE",
tl.pos = "td", tl.cex = 0.5, method = "number", type = "upper")
From the scatterplot matrix, there are multiple variables that are highly correlated. In view of this, it is wiser to only include either one of them in the subsequent model building. As a result, Rural_Percentage, COMP_K, COMP_J, COMP_Q, COMP_F, COMP_S, Pu_Agencies, COMP_I, COMP_H and GVA_Public is excluded in the subsequent model building.
The code chunk below using lm() to calibrate the multiple linear regression model. I will be using a confidence interval of 95%, hence the alpha value will be 0.05.
brazil.mlr <- lm(formula = GDP_CAPITA ~ Brazilian_Percentage + Foreign_Percentage + Working_Percentage + Production_Area + Tourism_Area + IDHM_Educacao + GVA_AGROPEC + COMP_A + COMP_B + COMP_D + COMP_E + COMP_O + COMP_U + HOTELS + GVA_INDUSTRY, data=brazil_cities_summ.sf)
summary(brazil.mlr)
##
## Call:
## lm(formula = GDP_CAPITA ~ Brazilian_Percentage + Foreign_Percentage +
## Working_Percentage + Production_Area + Tourism_Area + IDHM_Educacao +
## GVA_AGROPEC + COMP_A + COMP_B + COMP_D + COMP_E + COMP_O +
## COMP_U + HOTELS + GVA_INDUSTRY, data = brazil_cities_summ.sf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -88669 -7038 -2113 3836 259983
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.328e+03 5.559e+03 -0.419 0.6754
## Brazilian_Percentage -2.224e+04 5.692e+03 -3.908 9.41e-05 ***
## Foreign_Percentage -2.938e+03 3.947e+04 -0.074 0.9407
## Working_Percentage -1.142e+04 1.871e+03 -6.101 1.12e-09 ***
## Production_Area 7.861e+00 3.453e+01 0.228 0.8199
## Tourism_Area 7.085e+03 4.655e+02 15.221 < 2e-16 ***
## IDHM_Educacao 6.834e+04 3.007e+03 22.724 < 2e-16 ***
## GVA_AGROPEC 9.947e-02 5.196e-03 19.143 < 2e-16 ***
## COMP_A 6.087e+00 3.547e+00 1.716 0.0862 .
## COMP_B 3.979e+01 6.471e+01 0.615 0.5387
## COMP_D -3.004e+02 1.275e+02 -2.356 0.0185 *
## COMP_E -8.603e+02 6.555e+01 -13.124 < 2e-16 ***
## COMP_O 5.990e+02 9.784e+01 6.123 9.84e-10 ***
## COMP_U -7.573e+03 1.740e+03 -4.352 1.37e-05 ***
## HOTELS -7.259e+01 1.538e+02 -0.472 0.6370
## GVA_INDUSTRY 1.409e-02 4.759e-04 29.595 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15700 on 5557 degrees of freedom
## Multiple R-squared: 0.3929, Adjusted R-squared: 0.3913
## F-statistic: 239.8 on 15 and 5557 DF, p-value: < 2.2e-16
With reference to the report above, it is clear that not all the indepent variables are statistically significant. We will revised the model by removing those variables which are not statistically significant which are Foreign_Percentage, Production_Area, Comp_A, Comp_B and Hotels.
Based on the adjusted r-squared value, it means that this calculation is able to account for 39.13% of the GDP Per Capita.
Now, we are ready to calibrate the revised model by using the code chunk below.
brazil.mlr1 <- lm(formula = GDP_CAPITA ~ Brazilian_Percentage + Working_Percentage + Tourism_Area + IDHM_Educacao + GVA_AGROPEC + COMP_D + COMP_E + COMP_O + COMP_U + GVA_INDUSTRY, data=brazil_cities_summ.sf)
ols_regress(brazil.mlr1)
## Model Summary
## ---------------------------------------------------------------------
## R 0.627 RMSE 15695.730
## R-Squared 0.393 Coef. Var 100.210
## Adj. R-Squared 0.391 MSE 246355925.507
## Pred R-Squared 0.376 MAE 8479.728
## ---------------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -----------------------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -----------------------------------------------------------------------------------
## Regression 885468197318.578 10 88546819731.858 359.426 0.0000
## Residual 1.370232e+12 5562 246355925.507
## Total 2.2557e+12 5572
## -----------------------------------------------------------------------------------
##
## Parameter Estimates
## --------------------------------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## --------------------------------------------------------------------------------------------------------------
## (Intercept) -2404.316 5502.145 -0.437 0.662 -13190.669 8382.038
## Brazilian_Percentage -22419.987 5617.164 -0.043 -3.991 0.000 -33431.822 -11408.153
## Working_Percentage -11206.975 1865.409 -0.082 -6.008 0.000 -14863.906 -7550.044
## Tourism_Area 7118.977 461.724 0.176 15.418 0.000 6213.818 8024.136
## IDHM_Educacao 68752.352 2981.038 0.327 23.063 0.000 62908.352 74596.351
## GVA_AGROPEC 0.100 0.005 0.215 19.448 0.000 0.090 0.111
## COMP_D -303.942 126.558 -0.047 -2.402 0.016 -552.045 -55.839
## COMP_E -846.379 63.859 -0.255 -13.254 0.000 -971.567 -721.190
## COMP_O 599.528 94.557 0.124 6.340 0.000 414.160 784.896
## COMP_U -7616.105 1723.806 -0.073 -4.418 0.000 -10995.438 -4236.772
## GVA_INDUSTRY 0.014 0.000 0.448 29.713 0.000 0.013 0.015
## --------------------------------------------------------------------------------------------------------------
With the revised model, we only retain all the statistifically significant variables where it is smaller than 0.05. I can see that there are some changes towards the adjusted R-squared. It shows that by removing those non significant variables, the adjusted r-squared deproved slightly.
Based on the adjusted r-squared value, it means that this calculation is able to account for 39.1% of the GDP Per Capita.
In this section, I will be using the package olsrr I will be using the following methods for building better multiple linear regression models:
In the code chunk below, the ols_vif_tol() of olsrr package is used to test if there are sign of multicollinearity.
ols_vif_tol(brazil.mlr1)
## Variables Tolerance VIF
## 1 Brazilian_Percentage 0.9589430 1.042815
## 2 Working_Percentage 0.5923563 1.688173
## 3 Tourism_Area 0.8422091 1.187354
## 4 IDHM_Educacao 0.5440004 1.838234
## 5 GVA_AGROPEC 0.8974213 1.114304
## 6 COMP_D 0.2830552 3.532879
## 7 COMP_E 0.2954672 3.384471
## 8 COMP_O 0.2839964 3.521171
## 9 COMP_U 0.3971465 2.517963
## 10 GVA_INDUSTRY 0.4801487 2.082688
Since the VIF of the independent variables are less than 10. I can safely conclude that there are no sign of multicollinearity among the independent variables.
In multiple linear regression, it is important to test the assumption that linearity and additivity of the relationship between dependent and independent variables.
In the code chunk below, the ols_plot_resid_fit() of olsrr package is used to perform linearity assumption test.
ols_plot_resid_fit(brazil.mlr1)
The figure above reveals that most of the data poitns are scattered around the 0 line, hence we can safely conclude that the relationships between the dependent variable and independent variables are linear.
Lastly, the code chunk below uses ols_plot_resid_hist() of olsrr package to perform normality assumption test.
ols_plot_resid_hist(brazil.mlr1)
The figure reveals that the residual of the multiple linear regression model (i.e. brazil.mlr1) resembles a normal distribution pattern.
ols_test_normality has a restriction where sample size must be between 3 and 5000. The current dataset is 5572, hence I will be taking a sample of the data to test the normality.
sample_brazil_cities_summ.sf <- brazil_cities_summ.sf[sample(nrow(brazil_cities_summ.sf), 5000), ]
brazil.mlr2 <- lm(formula = GDP_CAPITA ~ Brazilian_Percentage + Working_Percentage + Tourism_Area + IDHM_Educacao + GVA_AGROPEC + COMP_A + COMP_D + COMP_E + COMP_O + COMP_U + GVA_INDUSTRY, data=sample_brazil_cities_summ.sf)
ols_regress(brazil.mlr2)
## Model Summary
## ---------------------------------------------------------------------
## R 0.628 RMSE 15927.625
## R-Squared 0.394 Coef. Var 101.918
## Adj. R-Squared 0.393 MSE 253689227.042
## Pred R-Squared 0.374 MAE 8517.410
## ---------------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -----------------------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -----------------------------------------------------------------------------------
## Regression 824200920636.918 11 74927356421.538 295.351 0.0000
## Residual 1.265402e+12 4988 253689227.042
## Total 2.089603e+12 4999
## -----------------------------------------------------------------------------------
##
## Parameter Estimates
## -------------------------------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## -------------------------------------------------------------------------------------------------------------
## (Intercept) -2869.700 5963.561 -0.481 0.630 -14560.901 8821.502
## Brazilian_Percentage -21493.010 6079.729 -0.040 -3.535 0.000 -33411.953 -9574.067
## Working_Percentage -11590.481 2001.926 -0.083 -5.790 0.000 -15515.135 -7665.826
## Tourism_Area 7135.556 496.391 0.173 14.375 0.000 6162.411 8108.700
## IDHM_Educacao 67969.100 3195.980 0.319 21.267 0.000 61703.574 74234.626
## GVA_AGROPEC 0.102 0.006 0.214 18.341 0.000 0.091 0.113
## COMP_A 8.158 3.770 0.025 2.164 0.031 0.767 15.549
## COMP_D -339.835 136.322 -0.053 -2.493 0.013 -607.086 -72.584
## COMP_E -914.713 68.236 -0.279 -13.405 0.000 -1048.485 -780.941
## COMP_O 667.008 104.792 0.134 6.365 0.000 461.570 872.446
## COMP_U -6439.908 2020.122 -0.056 -3.188 0.001 -10400.236 -2479.581
## GVA_INDUSTRY 0.014 0.000 0.460 28.955 0.000 0.013 0.015
## -------------------------------------------------------------------------------------------------------------
ols_test_normality(brazil.mlr2)
## Warning in ks.test(y, "pnorm", mean(y), sd(y)): ties should not be present for
## the Kolmogorov-Smirnov test
## -----------------------------------------------
## Test Statistic pvalue
## -----------------------------------------------
## Shapiro-Wilk 0.6229 0.0000
## Kolmogorov-Smirnov 0.1786 0.0000
## Cramer-von Mises 463.7117 0.0000
## Anderson-Darling 345.7266 0.0000
## -----------------------------------------------
The summary table above reveals that the p-values of the four tests are way smaller than the alpha value of 0.05. Hence we will reject the null hypothesis that the residual is NOT resemble normal distribution.
The hedonic model I am trying to build are using geographically referenced attributes, hence it is also important for us to visual the residual of the hedonic pricing model. In order to perform spatial autocorrelation test, there is a need to convert brazil_cities_summ.sf simple into a SpatialPointsDataFrame.
In this section, I will perform a test of absence of spatial autocorrelation for the residuals.
The test hypotheses are:
Ho = The distribution of residuals are randomly distributed.
H1= The distribution of residuals are not randomly distributed.
The 95% confidence interval will be used. Our alpha value will be 0.05.
First, we will export the residual of the hedonic pricing model and save it as a data frame. next, we will join the newly created data frame with brazil_cities_summ.sf object.
Below is the code chunk used to complete the tasks.
mlr.output <- as.data.frame(brazil.mlr1$residuals)
brazil_cities_summ.res.sf <- cbind(brazil_cities_summ.sf,
brazil.mlr1$residuals) %>%
rename(`MLR_RES` = `brazil.mlr1.residuals`)
In order to plot a choropleth map, I will be combining both data sets together to present the values.
mun_brazil_cities_mlr.sf <- st_join(mun_sirgas2000, brazil_cities_summ.res.sf)
Next, we will convert mun_brazil_cities_mlr.sf simple feature object into a SpatialPointsDataFrame because spdep package can only process sp conformed spatial data objects.
The code chunk below will be used to perform the data conversion process.
mun_brazil_cities_mlr.sp <- as_Spatial(mun_brazil_cities_mlr.sf)
mun_brazil_cities_mlr.sp
## class : SpatialPolygonsDataFrame
## features : 5575
## extent : 1552246, 6575781, 6030702, 10583412 (xmin, xmax, ymin, ymax)
## crs : +proj=merc +lon_0=-43 +lat_ts=-2 +x_0=5000000 +y_0=10000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## variables : 42
## names : code_mn, name_mn, cod_stt, abbrv_s, City_State, GDP_CAPITA, Brazilian_Percentage, Foreign_Percentage, Working_Percentage, Urban_Percentage, Rural_Percentage, Production_Area, Tourism_Area, IDHM_Educacao, GVA_AGROPEC, ...
## min values : 1100015, Ângulo, 11, AC, Abadia De Goiás - GO, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## max values : 5300108, Zortéa, 53, TO, Zortéa - SC, 314637.69, 1, 0.377218184890992, 0.709886841723488, 1, 0.954473822447908, 106.960227272727, 1, 0.825, 655505.29, ...
brazil_cities_summ.res.sp <- as_Spatial(brazil_cities_summ.res.sf)
brazil_cities_summ.res.sp
## class : SpatialPointsDataFrame
## features : 5573
## extent : 1671725, 6175358, 6039171, 10507274 (xmin, xmax, ymin, ymax)
## crs : +proj=merc +lon_0=-43 +lat_ts=-2 +x_0=5000000 +y_0=10000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## variables : 38
## names : City_State, GDP_CAPITA, Brazilian_Percentage, Foreign_Percentage, Working_Percentage, Urban_Percentage, Rural_Percentage, Production_Area, Tourism_Area, IDHM_Educacao, GVA_AGROPEC, GVA_INDUSTRY, GVA_SERVICES, GVA_PUBLIC, COMP_A, ...
## min values : Abadia De Goiás - GO, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## max values : Zortéa - SC, 314637.69, 1, 0.377218184890992, 0.709886841723488, 1, 0.954473822447908, 106.960227272727, 1, 0.825, 655505.29, 15043914.83, 53213121.5, 10664796.91, 1948, ...
The code chunks below is used to create an interactive point symbol map.
tm_shape(mun_brazil_cities_mlr.sp) +
tm_polygons(col = "MLR_RES",
alpha = 0.6,
style="quantile",
popup.vars = c("City_State","MLR_RES"))
## Variable(s) "MLR_RES" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
The figure above reveal that there is sign of spatial autocorrelation.
To proof that our observation is indeed true, the Moran’s I test will be performed
Next we will compute the distance-based weight matrix by using dnearneigh() function of spdep.I will find the minimum and maximum meters to be used in the dnearneigh() method.
coords <- coordinates(brazil_cities_summ.res.sp)
k1 <- knn2nb(knearneigh(coords))
k1dists <- unlist(nbdists(k1, coords, longlat = FALSE))
summary(k1dists)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 645.7 9640.9 13913.2 17755.3 20606.1 363945.4
nb <- dnearneigh(coordinates(brazil_cities_summ.res.sp), 0, 371000, longlat = FALSE)
summary(nb)
## Neighbour list object:
## Number of regions: 5573
## Number of nonzero links: 2783624
## Percentage nonzero weights: 8.962568
## Average number of links: 499.4839
## Link number distribution:
##
## 3 7 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
## 1 1 2 3 4 7 5 14 7 7 7 5 1 3 4 3 3 6 4 1
## 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
## 8 4 5 3 3 8 2 8 5 11 5 2 5 3 5 2 1 1 3 6
## 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
## 7 7 5 2 8 8 15 7 3 7 4 4 7 4 2 3 3 3 4 2
## 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86
## 2 5 2 2 2 4 5 6 4 4 3 2 1 2 4 7 1 1 4 2
## 88 89 93 94 96 97 99 100 101 102 103 104 105 106 107 108 109 110 111 112
## 4 2 1 1 1 2 1 3 1 3 7 2 2 4 5 3 2 3 3 4
## 113 114 115 116 117 118 121 122 123 124 125 126 127 128 129 130 131 132 133 134
## 2 3 1 4 5 1 3 2 3 3 3 1 6 2 4 3 5 9 4 2
## 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154
## 1 4 5 6 4 4 8 8 10 1 6 7 9 8 6 7 6 3 5 9
## 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 175
## 1 4 3 6 3 3 4 7 5 5 6 6 1 3 3 5 3 4 9 6
## 176 177 178 180 181 182 183 184 185 186 187 188 189 190 191 193 194 195 196 197
## 5 3 3 2 1 4 1 6 3 5 6 1 5 3 2 4 3 2 2 4
## 198 199 200 201 202 203 204 205 207 208 209 210 211 212 213 214 215 216 218 219
## 5 2 2 5 4 4 3 4 3 5 4 7 1 5 1 4 2 7 5 5
## 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239
## 5 7 6 4 4 6 4 5 6 8 8 7 4 3 6 5 9 5 3 2
## 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259
## 4 7 6 10 7 5 4 8 2 1 9 7 7 9 4 7 6 9 4 4
## 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279
## 4 5 8 7 10 7 7 11 6 11 6 4 6 10 7 6 5 6 6 6
## 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299
## 2 6 5 2 4 2 6 11 4 5 4 3 4 4 4 6 4 2 4 2
## 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319
## 5 4 7 8 6 2 4 5 4 2 8 9 6 1 7 11 3 4 9 9
## 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339
## 6 10 3 3 10 6 7 5 5 4 3 7 5 3 7 9 11 4 5 8
## 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359
## 6 7 4 4 3 8 10 3 6 6 10 7 8 6 6 6 7 9 5 11
## 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379
## 12 8 7 7 7 8 8 5 5 11 5 12 13 13 7 8 9 7 10 7
## 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399
## 9 8 7 6 9 4 11 6 6 8 13 11 14 8 7 3 5 2 9 8
## 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419
## 4 12 7 6 2 5 6 9 2 6 4 6 5 6 9 10 4 12 6 6
## 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439
## 9 4 7 4 13 5 4 6 3 7 2 4 7 4 2 4 5 1 8 5
## 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459
## 2 10 5 7 4 3 1 7 4 4 3 5 4 4 11 8 3 3 3 3
## 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479
## 3 2 3 5 5 4 2 3 7 8 2 9 5 7 6 6 12 7 5 6
## 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499
## 4 4 3 5 5 6 6 9 4 7 5 5 3 5 6 2 11 8 6 4
## 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519
## 5 5 6 3 6 9 3 4 7 7 4 6 2 3 6 6 8 8 6 4
## 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539
## 5 6 5 8 10 10 6 5 13 4 5 7 5 7 4 8 11 6 8 3
## 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559
## 4 3 9 8 2 6 7 5 9 12 1 6 6 10 7 5 5 9 5 10
## 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579
## 5 8 10 3 6 11 12 10 6 8 13 11 9 6 8 16 8 11 14 11
## 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599
## 13 13 9 10 13 7 10 12 4 7 9 9 11 10 4 7 10 11 9 9
## 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619
## 12 11 6 14 9 5 8 8 7 10 7 12 8 8 15 9 9 12 7 12
## 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639
## 8 13 14 13 6 8 10 10 11 8 12 5 11 16 8 12 12 11 7 17
## 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659
## 17 6 9 13 6 14 13 9 5 12 11 13 14 14 14 12 11 6 9 12
## 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679
## 12 6 8 8 9 7 11 6 9 10 15 10 14 7 5 7 9 11 14 9
## 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699
## 5 18 4 7 19 9 15 12 8 13 11 8 14 6 11 11 8 15 11 7
## 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719
## 13 11 8 9 12 14 15 7 10 11 12 7 10 13 9 9 8 13 11 6
## 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739
## 4 7 8 7 8 9 7 10 11 14 5 13 12 4 6 5 10 10 9 11
## 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759
## 18 17 6 6 11 11 5 7 8 10 10 8 17 6 9 12 7 15 5 8
## 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779
## 7 14 8 6 9 6 6 8 8 6 8 12 9 8 3 11 10 6 14 7
## 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799
## 12 10 7 9 5 8 9 12 6 6 5 11 10 8 12 8 8 9 10 8
## 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819
## 12 8 8 6 5 7 6 9 9 6 9 7 7 11 3 9 8 9 9 8
## 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839
## 2 9 7 6 4 9 9 13 11 6 10 6 3 2 4 7 6 3 7 8
## 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859
## 5 7 4 3 5 6 8 4 3 3 6 6 4 5 1 4 7 5 4 4
## 860 861 862 863 864 865 866 867 868 870
## 5 2 1 4 2 1 1 2 1 1
## 1 least connected region:
## 1774 with 3 links
## 1 most connected region:
## 4046 with 870 links
nb_lw <- nb2listw(nb, style = 'W')
summary(nb_lw)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 5573
## Number of nonzero links: 2783624
## Percentage nonzero weights: 8.962568
## Average number of links: 499.4839
## Link number distribution:
##
## 3 7 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
## 1 1 2 3 4 7 5 14 7 7 7 5 1 3 4 3 3 6 4 1
## 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
## 8 4 5 3 3 8 2 8 5 11 5 2 5 3 5 2 1 1 3 6
## 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
## 7 7 5 2 8 8 15 7 3 7 4 4 7 4 2 3 3 3 4 2
## 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86
## 2 5 2 2 2 4 5 6 4 4 3 2 1 2 4 7 1 1 4 2
## 88 89 93 94 96 97 99 100 101 102 103 104 105 106 107 108 109 110 111 112
## 4 2 1 1 1 2 1 3 1 3 7 2 2 4 5 3 2 3 3 4
## 113 114 115 116 117 118 121 122 123 124 125 126 127 128 129 130 131 132 133 134
## 2 3 1 4 5 1 3 2 3 3 3 1 6 2 4 3 5 9 4 2
## 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154
## 1 4 5 6 4 4 8 8 10 1 6 7 9 8 6 7 6 3 5 9
## 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 175
## 1 4 3 6 3 3 4 7 5 5 6 6 1 3 3 5 3 4 9 6
## 176 177 178 180 181 182 183 184 185 186 187 188 189 190 191 193 194 195 196 197
## 5 3 3 2 1 4 1 6 3 5 6 1 5 3 2 4 3 2 2 4
## 198 199 200 201 202 203 204 205 207 208 209 210 211 212 213 214 215 216 218 219
## 5 2 2 5 4 4 3 4 3 5 4 7 1 5 1 4 2 7 5 5
## 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239
## 5 7 6 4 4 6 4 5 6 8 8 7 4 3 6 5 9 5 3 2
## 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259
## 4 7 6 10 7 5 4 8 2 1 9 7 7 9 4 7 6 9 4 4
## 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279
## 4 5 8 7 10 7 7 11 6 11 6 4 6 10 7 6 5 6 6 6
## 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299
## 2 6 5 2 4 2 6 11 4 5 4 3 4 4 4 6 4 2 4 2
## 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319
## 5 4 7 8 6 2 4 5 4 2 8 9 6 1 7 11 3 4 9 9
## 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339
## 6 10 3 3 10 6 7 5 5 4 3 7 5 3 7 9 11 4 5 8
## 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359
## 6 7 4 4 3 8 10 3 6 6 10 7 8 6 6 6 7 9 5 11
## 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379
## 12 8 7 7 7 8 8 5 5 11 5 12 13 13 7 8 9 7 10 7
## 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399
## 9 8 7 6 9 4 11 6 6 8 13 11 14 8 7 3 5 2 9 8
## 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419
## 4 12 7 6 2 5 6 9 2 6 4 6 5 6 9 10 4 12 6 6
## 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439
## 9 4 7 4 13 5 4 6 3 7 2 4 7 4 2 4 5 1 8 5
## 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459
## 2 10 5 7 4 3 1 7 4 4 3 5 4 4 11 8 3 3 3 3
## 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479
## 3 2 3 5 5 4 2 3 7 8 2 9 5 7 6 6 12 7 5 6
## 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499
## 4 4 3 5 5 6 6 9 4 7 5 5 3 5 6 2 11 8 6 4
## 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519
## 5 5 6 3 6 9 3 4 7 7 4 6 2 3 6 6 8 8 6 4
## 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539
## 5 6 5 8 10 10 6 5 13 4 5 7 5 7 4 8 11 6 8 3
## 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559
## 4 3 9 8 2 6 7 5 9 12 1 6 6 10 7 5 5 9 5 10
## 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579
## 5 8 10 3 6 11 12 10 6 8 13 11 9 6 8 16 8 11 14 11
## 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599
## 13 13 9 10 13 7 10 12 4 7 9 9 11 10 4 7 10 11 9 9
## 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619
## 12 11 6 14 9 5 8 8 7 10 7 12 8 8 15 9 9 12 7 12
## 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639
## 8 13 14 13 6 8 10 10 11 8 12 5 11 16 8 12 12 11 7 17
## 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659
## 17 6 9 13 6 14 13 9 5 12 11 13 14 14 14 12 11 6 9 12
## 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679
## 12 6 8 8 9 7 11 6 9 10 15 10 14 7 5 7 9 11 14 9
## 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699
## 5 18 4 7 19 9 15 12 8 13 11 8 14 6 11 11 8 15 11 7
## 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719
## 13 11 8 9 12 14 15 7 10 11 12 7 10 13 9 9 8 13 11 6
## 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739
## 4 7 8 7 8 9 7 10 11 14 5 13 12 4 6 5 10 10 9 11
## 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759
## 18 17 6 6 11 11 5 7 8 10 10 8 17 6 9 12 7 15 5 8
## 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779
## 7 14 8 6 9 6 6 8 8 6 8 12 9 8 3 11 10 6 14 7
## 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799
## 12 10 7 9 5 8 9 12 6 6 5 11 10 8 12 8 8 9 10 8
## 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819
## 12 8 8 6 5 7 6 9 9 6 9 7 7 11 3 9 8 9 9 8
## 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839
## 2 9 7 6 4 9 9 13 11 6 10 6 3 2 4 7 6 3 7 8
## 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859
## 5 7 4 3 5 6 8 4 3 3 6 6 4 5 1 4 7 5 4 4
## 860 861 862 863 864 865 866 867 868 870
## 5 2 1 4 2 1 1 2 1 1
## 1 least connected region:
## 1774 with 3 links
## 1 most connected region:
## 4046 with 870 links
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 5573 31058329 5573 46.16184 22428.61
lm.morantest(brazil.mlr1, nb_lw)
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = GDP_CAPITA ~ Brazilian_Percentage +
## Working_Percentage + Tourism_Area + IDHM_Educacao + GVA_AGROPEC +
## COMP_D + COMP_E + COMP_O + COMP_U + GVA_INDUSTRY, data =
## brazil_cities_summ.sf)
## weights: nb_lw
##
## Moran I statistic standard deviate = 41.846, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 4.796024e-02 -3.500424e-04 1.332827e-06
The results shows that the Moran I p-value is less than 0.00000000000000022 which is smaller than the alpha value of 0.05. Hence, we will reject the null hypothesis that the residuals are randomly distributed. This will allow us to infer that the distribution is a cuter distribution. Insce the observed Global Moran I = 0.04796024 which is positive spatial autocorrelation which means the residuals shows signs of cluster, even though it is not very strong
In this section, I will be modelling hedonic pricing using both the fixed and adaptive bandwidth schemes and comparing which to use for GWR visualisation.
In the code chunk below bw.gwr() of GWModel package is used to determine the optimal fixed bandwidth to use in the model. I will be using the CV Cross-Validation Approach as the stopping rule. I will be setting adaptive as false since I will be currently using a fixed bandwidth method where the bandwidth is of a fixed distance. I will be using the gaussian method as the kernel where the calculation is gaussian **wgt = exp(-.5*(vdist/bw)^2)** and longlat would be false as the same as what I have used above for our dnearneigh.
bw.fixed <- bw.gwr(formula = GDP_CAPITA ~ Brazilian_Percentage + Working_Percentage + Tourism_Area + IDHM_Educacao + GVA_AGROPEC + COMP_D + COMP_E + COMP_O + COMP_U + GVA_INDUSTRY, data=brazil_cities_summ.res.sp, approach="CV", kernel="gaussian", adaptive=FALSE, longlat=FALSE)
## Take a cup of tea and have a break, it will take a few minutes.
## -----A kind suggestion from GWmodel development group
## Fixed bandwidth: 3921308 CV score: 1.395285e+12
## Fixed bandwidth: 2423986 CV score: 1.378261e+12
## Fixed bandwidth: 1498590 CV score: 1.348028e+12
## Fixed bandwidth: 926664.5 CV score: 1.338012e+12
## Fixed bandwidth: 573194.8 CV score: 1.336135e+12
## Fixed bandwidth: 354738.5 CV score: 1.504225e+12
## Fixed bandwidth: 708208.2 CV score: 1.338323e+12
## Fixed bandwidth: 489751.9 CV score: 1.337712e+12
## Fixed bandwidth: 624765.3 CV score: 1.336911e+12
## Fixed bandwidth: 541322.4 CV score: 1.335897e+12
## Fixed bandwidth: 521624.2 CV score: 1.336033e+12
## Fixed bandwidth: 553496.6 CV score: 1.335945e+12
## Fixed bandwidth: 533798.4 CV score: 1.335909e+12
## Fixed bandwidth: 545972.5 CV score: 1.335907e+12
## Fixed bandwidth: 538448.5 CV score: 1.335897e+12
## Fixed bandwidth: 543098.6 CV score: 1.335899e+12
## Fixed bandwidth: 540224.7 CV score: 1.335896e+12
## Fixed bandwidth: 539546.3 CV score: 1.335897e+12
## Fixed bandwidth: 540644 CV score: 1.335897e+12
## Fixed bandwidth: 539965.6 CV score: 1.335896e+12
## Fixed bandwidth: 539805.4 CV score: 1.335896e+12
## Fixed bandwidth: 540064.5 CV score: 1.335896e+12
## Fixed bandwidth: 539904.4 CV score: 1.335896e+12
## Fixed bandwidth: 540003.4 CV score: 1.335896e+12
## Fixed bandwidth: 539942.2 CV score: 1.335896e+12
## Fixed bandwidth: 539980 CV score: 1.335896e+12
## Fixed bandwidth: 539956.6 CV score: 1.335896e+12
## Fixed bandwidth: 539951.1 CV score: 1.335896e+12
## Fixed bandwidth: 539960 CV score: 1.335896e+12
## Fixed bandwidth: 539954.5 CV score: 1.335896e+12
## Fixed bandwidth: 539953.2 CV score: 1.335896e+12
## Fixed bandwidth: 539955.3 CV score: 1.335896e+12
## Fixed bandwidth: 539955.8 CV score: 1.335896e+12
The result shows that the recommended bandwidth is 539955.3 metres. THis is because I am using SIRGAS 2000 which is projected in metres.
Now we can use the code chunk below to calibrate the gwr model using fixed bandwidth and gaussian kernel. The code chunk belows takes all the points of the municipality and do regression with the bandwidth of 539955.3 metres.
gwr.fixed <- gwr.basic(formula = GDP_CAPITA ~ Brazilian_Percentage + Working_Percentage + Tourism_Area + IDHM_Educacao + GVA_AGROPEC + COMP_D + COMP_E + COMP_O + COMP_U + GVA_INDUSTRY, data=brazil_cities_summ.res.sp, bw=bw.fixed, kernel = 'gaussian', longlat = FALSE)
The output is saved in a list of class “gwrm”. The code below can be used to display the model output.
gwr.fixed
## ***********************************************************************
## * Package GWmodel *
## ***********************************************************************
## Program starts at: 2020-05-31 21:45:36
## Call:
## gwr.basic(formula = GDP_CAPITA ~ Brazilian_Percentage + Working_Percentage +
## Tourism_Area + IDHM_Educacao + GVA_AGROPEC + COMP_D + COMP_E +
## COMP_O + COMP_U + GVA_INDUSTRY, data = brazil_cities_summ.res.sp,
## bw = bw.fixed, kernel = "gaussian", longlat = FALSE)
##
## Dependent (y) variable: GDP_CAPITA
## Independent variables: Brazilian_Percentage Working_Percentage Tourism_Area IDHM_Educacao GVA_AGROPEC COMP_D COMP_E COMP_O COMP_U GVA_INDUSTRY
## Number of data points: 5573
## ***********************************************************************
## * Results of Global Regression *
## ***********************************************************************
##
## Call:
## lm(formula = formula, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -90406 -7025 -2105 3835 260725
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.404e+03 5.502e+03 -0.437 0.6621
## Brazilian_Percentage -2.242e+04 5.617e+03 -3.991 6.65e-05 ***
## Working_Percentage -1.121e+04 1.865e+03 -6.008 2.00e-09 ***
## Tourism_Area 7.119e+03 4.617e+02 15.418 < 2e-16 ***
## IDHM_Educacao 6.875e+04 2.981e+03 23.063 < 2e-16 ***
## GVA_AGROPEC 1.004e-01 5.164e-03 19.448 < 2e-16 ***
## COMP_D -3.039e+02 1.266e+02 -2.402 0.0164 *
## COMP_E -8.464e+02 6.386e+01 -13.254 < 2e-16 ***
## COMP_O 5.995e+02 9.456e+01 6.340 2.47e-10 ***
## COMP_U -7.616e+03 1.724e+03 -4.418 1.01e-05 ***
## GVA_INDUSTRY 1.406e-02 4.732e-04 29.713 < 2e-16 ***
##
## ---Significance stars
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 15700 on 5562 degrees of freedom
## Multiple R-squared: 0.3925
## Adjusted R-squared: 0.3915
## F-statistic: 359.4 on 10 and 5562 DF, p-value: < 2.2e-16
## ***Extra Diagnostic information
## Residual sum of squares: 1.370232e+12
## Sigma(hat): 15683.05
## AIC: 123511.6
## AICc: 123511.6
## ***********************************************************************
## * Results of Geographically Weighted Regression *
## ***********************************************************************
##
## *********************Model calibration information*********************
## Kernel function: gaussian
## Fixed bandwidth: 539955.3
## Regression points: the same locations as observations are used.
## Distance metric: Euclidean distance metric is used.
##
## ****************Summary of GWR coefficient estimates:******************
## Min. 1st Qu. Median 3rd Qu.
## Intercept -1.8556e+04 -3.4687e+03 -1.1980e+03 3.3629e+03
## Brazilian_Percentage -2.7412e+04 -2.4407e+04 -1.3465e+04 2.2394e+03
## Working_Percentage -2.8761e+04 -1.4797e+04 -9.9711e+02 3.0143e+03
## Tourism_Area -1.9085e+03 3.8788e+03 4.8524e+03 5.3596e+03
## IDHM_Educacao -9.5579e+02 1.1444e+04 5.2686e+04 6.9595e+04
## GVA_AGROPEC 4.5468e-02 6.7269e-02 8.2598e-02 9.7079e-02
## COMP_D -4.4549e+03 -1.5244e+03 -9.4496e+02 -7.2777e+02
## COMP_E -1.4207e+04 -1.0215e+03 -8.8549e+02 -6.9897e+02
## COMP_O -5.9314e+02 1.7239e+02 1.4658e+03 1.8282e+03
## COMP_U -9.3366e+05 -1.6689e+04 -8.4031e+03 -1.4520e+02
## GVA_INDUSTRY 1.2729e-02 1.3531e-02 1.5149e-02 2.2198e-02
## Max.
## Intercept 5535.7426
## Brazilian_Percentage 4183.2976
## Working_Percentage 6797.9479
## Tourism_Area 9485.6013
## IDHM_Educacao 83829.7524
## GVA_AGROPEC 0.3683
## COMP_D 49994.1585
## COMP_E -369.0832
## COMP_O 5630.5519
## COMP_U 5858.0682
## GVA_INDUSTRY 0.2505
## ************************Diagnostic information*************************
## Number of data points: 5573
## Effective number of parameters (2trace(S) - trace(S'S)): 66.16194
## Effective degrees of freedom (n-2trace(S) + trace(S'S)): 5506.838
## AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 122818.1
## AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 122765.2
## Residual sum of squares: 1.192907e+12
## R-square value: 0.4711589
## Adjusted R-square value: 0.464804
##
## ***********************************************************************
## Program stops at: 2020-05-31 21:45:55
From the results above, I am able to see that the Geograpic weighted regression is better than the global multiple regression. The reason is because the adjusted r-square has increased significantly where it was originally 0.3915 which has increased to 0.464804. Besides that the AIC is much smaller, where it was originally 123511.6 now it has decreased to 122765.2. Hence, we are able to see that the Geograpic weighted regression gives a better model than the global multiple regression model.
Since we will be comparing between adaptive and fixed bandwidth model. I will now be calibrating the gwr-absed hedonic pricing model by using adaptive bandwidth approach.
The code chunk is similar to the one used to compute the fixed bandwidth except the adaptive argument has changed to TRUE. This is because TRUE means that I am calculating an adaptive kernel where the bandwidth (bw) corresponds to the number of nearest neighbours which is the adaptive bandwidth method. Besides that, I will be inputting a dMat which I will be using coordinates to retrieve a set of spatial coordinates.
DM<-gw.dist(dp.locat=coordinates(brazil_cities_summ.res.sp))
bw.adaptive <- bw.gwr(formula = GDP_CAPITA ~ Brazilian_Percentage + Working_Percentage + Tourism_Area + IDHM_Educacao + GVA_AGROPEC + COMP_D + COMP_E + COMP_O + COMP_U + GVA_INDUSTRY, data=brazil_cities_summ.res.sp, approach="CV", kernel="gaussian", adaptive=TRUE, longlat=FALSE, dMat= DM)
## Take a cup of tea and have a break, it will take a few minutes.
## -----A kind suggestion from GWmodel development group
## Adaptive bandwidth: 3451 CV score: 1.363368e+12
## Adaptive bandwidth: 2141 CV score: 1.331684e+12
## Adaptive bandwidth: 1329 CV score: 1.335087e+12
## Adaptive bandwidth: 2640 CV score: 1.344421e+12
## Adaptive bandwidth: 1829 CV score: 1.324066e+12
## Adaptive bandwidth: 1640 CV score: 1.323113e+12
## Adaptive bandwidth: 1519 CV score: 1.326376e+12
## Adaptive bandwidth: 1710 CV score: 1.323179e+12
## Adaptive bandwidth: 1591 CV score: 1.323727e+12
## Adaptive bandwidth: 1664 CV score: 1.322965e+12
## Adaptive bandwidth: 1685 CV score: 1.32311e+12
## Adaptive bandwidth: 1657 CV score: 1.322963e+12
## Adaptive bandwidth: 1646 CV score: 1.322912e+12
## Adaptive bandwidth: 1646 CV score: 1.322912e+12
The result shows that the 1646 is the recommended data points to be used.
Now, I will calibrate the gwr-based hedonic pricing model by using adaptive bandwidth and gaussian kernel as shown in the code chunk below.
gwr.adaptive <- gwr.basic(formula = GDP_CAPITA ~ Brazilian_Percentage + Working_Percentage + Tourism_Area + IDHM_Educacao + GVA_AGROPEC + COMP_D + COMP_E + COMP_O + COMP_U + GVA_INDUSTRY, data=brazil_cities_summ.res.sp, bw=bw.adaptive, kernel = 'gaussian', adaptive=TRUE, longlat = FALSE, dMat= DM)
The code below can be used to display the model output.
gwr.adaptive
## ***********************************************************************
## * Package GWmodel *
## ***********************************************************************
## Program starts at: 2020-05-31 21:48:13
## Call:
## gwr.basic(formula = GDP_CAPITA ~ Brazilian_Percentage + Working_Percentage +
## Tourism_Area + IDHM_Educacao + GVA_AGROPEC + COMP_D + COMP_E +
## COMP_O + COMP_U + GVA_INDUSTRY, data = brazil_cities_summ.res.sp,
## bw = bw.adaptive, kernel = "gaussian", adaptive = TRUE, longlat = FALSE,
## dMat = DM)
##
## Dependent (y) variable: GDP_CAPITA
## Independent variables: Brazilian_Percentage Working_Percentage Tourism_Area IDHM_Educacao GVA_AGROPEC COMP_D COMP_E COMP_O COMP_U GVA_INDUSTRY
## Number of data points: 5573
## ***********************************************************************
## * Results of Global Regression *
## ***********************************************************************
##
## Call:
## lm(formula = formula, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -90406 -7025 -2105 3835 260725
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.404e+03 5.502e+03 -0.437 0.6621
## Brazilian_Percentage -2.242e+04 5.617e+03 -3.991 6.65e-05 ***
## Working_Percentage -1.121e+04 1.865e+03 -6.008 2.00e-09 ***
## Tourism_Area 7.119e+03 4.617e+02 15.418 < 2e-16 ***
## IDHM_Educacao 6.875e+04 2.981e+03 23.063 < 2e-16 ***
## GVA_AGROPEC 1.004e-01 5.164e-03 19.448 < 2e-16 ***
## COMP_D -3.039e+02 1.266e+02 -2.402 0.0164 *
## COMP_E -8.464e+02 6.386e+01 -13.254 < 2e-16 ***
## COMP_O 5.995e+02 9.456e+01 6.340 2.47e-10 ***
## COMP_U -7.616e+03 1.724e+03 -4.418 1.01e-05 ***
## GVA_INDUSTRY 1.406e-02 4.732e-04 29.713 < 2e-16 ***
##
## ---Significance stars
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 15700 on 5562 degrees of freedom
## Multiple R-squared: 0.3925
## Adjusted R-squared: 0.3915
## F-statistic: 359.4 on 10 and 5562 DF, p-value: < 2.2e-16
## ***Extra Diagnostic information
## Residual sum of squares: 1.370232e+12
## Sigma(hat): 15683.05
## AIC: 123511.6
## AICc: 123511.6
## ***********************************************************************
## * Results of Geographically Weighted Regression *
## ***********************************************************************
##
## *********************Model calibration information*********************
## Kernel function: gaussian
## Adaptive bandwidth: 1646 (number of nearest neighbours)
## Regression points: the same locations as observations are used.
## Distance metric: A distance matrix is specified for this model calibration.
##
## ****************Summary of GWR coefficient estimates:******************
## Min. 1st Qu. Median 3rd Qu.
## Intercept -1.1095e+04 -6.3114e+03 -4.0954e+03 2.4134e+03
## Brazilian_Percentage -2.7348e+04 -2.5665e+04 -1.9347e+04 -6.4303e+03
## Working_Percentage -2.2478e+04 -1.5894e+04 -6.9321e+03 2.7160e+03
## Tourism_Area 4.3196e+03 5.1647e+03 5.6113e+03 6.0234e+03
## IDHM_Educacao 1.8012e+04 3.7979e+04 6.3580e+04 7.2073e+04
## GVA_AGROPEC 6.3749e-02 8.0205e-02 8.6906e-02 9.7140e-02
## COMP_D -1.5274e+03 -9.4935e+02 -7.1588e+02 -4.3114e+02
## COMP_E -1.0454e+03 -9.4000e+02 -8.9357e+02 -8.6326e+02
## COMP_O -1.3658e+01 3.1497e+02 1.1099e+03 1.6933e+03
## COMP_U -1.9692e+04 -1.1752e+04 -8.7975e+03 -6.9979e+03
## GVA_INDUSTRY 1.3008e-02 1.3433e-02 1.4026e-02 1.6852e-02
## Max.
## Intercept 4808.6564
## Brazilian_Percentage -40.6551
## Working_Percentage 4784.9676
## Tourism_Area 7608.9509
## IDHM_Educacao 78617.7536
## GVA_AGROPEC 0.1100
## COMP_D -140.3515
## COMP_E -659.5818
## COMP_O 1935.3819
## COMP_U -1628.7573
## GVA_INDUSTRY 0.0209
## ************************Diagnostic information*************************
## Number of data points: 5573
## Effective number of parameters (2trace(S) - trace(S'S)): 30.53195
## Effective degrees of freedom (n-2trace(S) + trace(S'S)): 5542.468
## AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 123049.7
## AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 123023.5
## Residual sum of squares: 1.255332e+12
## R-square value: 0.4434843
## Adjusted R-square value: 0.440418
##
## ***********************************************************************
## Program stops at: 2020-05-31 21:48:35
From the results above, I am able to see that the Geograpic weighted regression is better than the global multiple regression. The reason is because the adjusted r-square has increased significantly where it was originally 0.3915 which has increased to 0.440418. Besides that the AIC is much smaller, where it was originally 123511.6 now it has decreased to 123023.5 Hence, we are able to see that the Geograpic weighted regression gives a better model than the global multiple regression model.
Based on what we have for both adaptive and fixed bandwidth method, it can be concluded that I am going to be using the fixed bandwidth method due to the adjusted r-square being higher.
In this exercise, I will be using he Local R2 numbers as well as the variable coefficients as shown below.
Local R2: these values range between 0.0 and 1.0 and indicate how well the local regression model fits observed y values. Very low values indicate the local model is performing poorly. Mapping the Local R2 values to see where GWR predicts well and where it predicts poorly may provide clues about important variables that may be missing from the regression model.
Coefficient Standard Error: these values measure the reliability of each coefficient estimate. Confidence in those estimates are higher when standard errors are small in relation to the actual coefficient values. Large standard errors may indicate problems with local collinearity.
They are all stored in a SpatialPointsDataFrame or SpatialPolygonsDataFrame object integrated with fit.points, GWR coefficient estimates, y value, predicted values, coefficient standard errors and t-values in its “data” slot in an object called SDF of the output list.
To visualise the fields in SDF, we need to first covert it into sf data.frame by using the code chunk below.
brazil_cities.sf.fixed <- st_as_sf(gwr.fixed$SDF) %>%
st_transform(crs=4676)
brazil_cities.sf.fixed.sirgas2000 <- st_transform(brazil_cities.sf.fixed, 5641)
brazil_cities.sf.fixed.sirgas2000
## Simple feature collection with 5573 features and 39 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 1671725 ymin: 6039171 xmax: 6175358 ymax: 10507270
## CRS: EPSG:5641
## First 10 features:
## Intercept Brazilian_Percentage Working_Percentage Tourism_Area
## 1 -3144.9979 -18544.300 -5545.65498 5774.894
## 2 -3214.9323 -18870.714 -4034.40181 5171.673
## 3 -4898.3327 -16017.149 -3748.36141 5759.233
## 4 -5109.5959 -16568.231 -1461.54560 4933.319
## 5 -378.7536 -2800.052 -56.62049 3410.307
## 6 -2262.6052 3033.102 5211.29736 3502.623
## 7 -13085.9487 2539.424 3244.38306 4936.943
## 8 -3010.4234 3235.564 4925.69194 3573.499
## 9 3612.8936 -24808.871 -16892.61030 5077.629
## 10 5020.6147 -24485.546 -21341.76662 5373.567
## IDHM_Educacao GVA_AGROPEC COMP_D COMP_E COMP_O COMP_U
## 1 56678.151 0.11435641 -888.6802 -1200.3418 2363.1734 -19473.776
## 2 58668.416 0.09611855 -777.2283 -1118.0254 2028.6632 -16932.117
## 3 53634.852 0.11682160 -736.8842 -1184.1499 2250.8948 -23534.834
## 4 56942.664 0.08784055 -563.1791 -1061.4201 1702.3320 -18204.853
## 5 12526.243 0.11257821 -3116.2114 -927.2417 1066.3260 -5448.255
## 6 6230.986 0.05016339 -2016.9248 -607.8074 -410.0446 4466.898
## 7 29460.271 0.07457199 -748.1338 -889.1235 319.1103 -17946.766
## 8 7408.581 0.05023498 -2161.7728 -576.0529 -470.9893 4827.403
## 9 71070.032 0.08363214 -1004.7836 -958.0344 1923.1378 -8543.725
## 10 73526.533 0.08989688 -923.9524 -813.1687 1582.1872 -6852.952
## GVA_INDUSTRY y yhat residual CV_Score Stud_residual
## 1 0.01535003 20664.57 18190.192 2474.3782 0 0.16868894
## 2 0.01454948 25591.70 20763.580 4828.1199 0 0.32853246
## 3 0.01543732 0.00 8649.083 -8649.0827 0 -0.58826741
## 4 0.01423313 0.00 9210.683 -9210.6828 0 -0.62663901
## 5 0.01694804 0.00 3527.503 -3527.5026 0 -0.24048510
## 6 0.02540875 6370.41 6112.793 257.6173 0 0.01756732
## 7 0.02095800 6982.70 10392.853 -3410.1527 0 -0.23235544
## 8 0.02632233 0.00 5187.810 -5187.8102 0 -0.35272741
## 9 0.01347565 21173.60 19472.979 1700.6211 0 0.11569231
## 10 0.01305545 24739.02 30303.309 -5564.2894 0 -0.37874989
## Intercept_SE Brazilian_Percentage_SE Working_Percentage_SE Tourism_Area_SE
## 1 11646.291 11751.671 2925.076 644.4293
## 2 9371.692 9502.062 2714.078 614.0907
## 3 11046.372 11117.922 2962.319 640.2940
## 4 7792.127 7929.122 2788.390 622.5630
## 5 14000.490 14423.756 5692.811 1554.2536
## 6 10704.016 10710.061 3410.244 773.4134
## 7 11303.343 11301.680 3069.502 644.0903
## 8 10508.294 10505.228 3311.623 751.8942
## 9 6819.852 7279.425 2394.739 630.1943
## 10 7056.223 7708.729 2654.521 738.8503
## IDHM_Educacao_SE GVA_AGROPEC_SE COMP_D_SE COMP_E_SE COMP_O_SE COMP_U_SE
## 1 4799.106 0.007221642 212.9040 79.08249 181.0953 3137.984
## 2 4534.291 0.006814153 192.3377 71.67762 157.9748 2857.792
## 3 4709.190 0.007344399 221.8439 80.16625 175.1305 3260.319
## 4 4561.050 0.006884230 195.1429 71.34929 151.3758 2876.555
## 5 8941.114 0.031811658 1011.8494 516.56482 420.6442 6552.888
## 6 6075.690 0.009148915 467.7670 212.15986 195.1425 3697.514
## 7 4927.449 0.007648189 238.6086 101.54968 123.0517 3012.292
## 8 5871.028 0.009035869 428.2346 191.42099 178.6005 3552.911
## 9 4861.638 0.006179895 153.8894 70.65013 165.4053 2142.220
## 10 5860.345 0.006742507 167.7534 79.60971 198.2892 2097.650
## GVA_INDUSTRY_SE Intercept_TV Brazilian_Percentage_TV Working_Percentage_TV
## 1 0.0005487270 -0.27004287 -1.5780139 -1.895901265
## 2 0.0005173693 -0.34304715 -1.9859599 -1.486472203
## 3 0.0005547186 -0.44343361 -1.4406603 -1.265347136
## 4 0.0005159234 -0.65573829 -2.0895417 -0.524153888
## 5 0.0034655423 -0.02705288 -0.1941278 -0.009945963
## 6 0.0013291054 -0.21137910 0.2832012 1.528130383
## 7 0.0008045272 -1.15770609 0.2246944 1.056973608
## 8 0.0012830511 -0.28648070 0.3079956 1.487395129
## 9 0.0005035637 0.52976127 -3.4080811 -7.054050414
## 10 0.0005482723 0.71151585 -3.1763403 -8.039779680
## Tourism_Area_TV IDHM_Educacao_TV GVA_AGROPEC_TV COMP_D_TV COMP_E_TV
## 1 8.961253 11.810149 15.835236 -4.174089 -15.178351
## 2 8.421677 12.938830 14.105721 -4.040957 -15.597970
## 3 8.994671 11.389401 15.906216 -3.321634 -14.771177
## 4 7.924210 12.484551 12.759677 -2.885982 -14.876394
## 5 2.194177 1.400971 3.538898 -3.079719 -1.795015
## 6 4.528785 1.025560 5.482988 -4.311816 -2.864856
## 7 7.664985 5.978808 9.750281 -3.135402 -8.755552
## 8 4.752662 1.261888 5.559507 -5.048104 -3.009351
## 9 8.057244 14.618535 13.532938 -6.529257 -13.560264
## 10 7.272877 12.546451 13.332858 -5.507801 -10.214442
## COMP_O_TV COMP_U_TV GVA_INDUSTRY_TV Local_R2 geometry
## 1 13.049333 -6.205825 27.973893 0.3599996 POINT (4283475 8120684)
## 2 12.841693 -5.924896 28.122037 0.3580482 POINT (4510843 7920105)
## 3 12.852675 -7.218567 27.829099 0.3681975 POINT (4363770 8187113)
## 4 11.245737 -6.328700 27.587675 0.3625905 POINT (4727856 7842028)
## 5 2.534983 -0.831428 4.890445 0.8138496 POINT (4345348 9809515)
## 6 -2.101257 1.208082 19.117183 0.7088541 POINT (5439719 9184727)
## 7 2.593303 -5.957844 26.050083 0.5844539 POINT (5148899 8521972)
## 8 -2.637110 1.358718 20.515417 0.7107090 POINT (5432038 9032202)
## 9 11.626819 -3.988258 26.760575 0.3890840 POINT (4186466 7350101)
## 10 7.979188 -3.266966 23.811986 0.4519263 POINT (4107171 6821956)
gwr.fixed.output <- as.data.frame(gwr.fixed$SDF)
brazil_cities.sf.fixed <- cbind(brazil_cities_summ.sf, as.matrix(gwr.fixed.output))
summary(brazil_cities.sf.fixed)
## City_State GDP_CAPITA Brazilian_Percentage Foreign_Percentage
## Length:5573 Min. : 0 Min. :0.0000 Min. :0.0000000
## Class :character 1st Qu.: 0 1st Qu.:0.9993 1st Qu.:0.0000000
## Mode :character Median : 10474 Median :1.0000 Median :0.0000000
## Mean : 15663 Mean :0.9978 Mean :0.0007581
## 3rd Qu.: 21967 3rd Qu.:1.0000 3rd Qu.:0.0006981
## Max. :314638 Max. :1.0000 Max. :0.3772182
## Working_Percentage Urban_Percentage Rural_Percentage Production_Area
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. : 0.000
## 1st Qu.:0.2815 1st Qu.:0.4907 1st Qu.:0.1686 1st Qu.: 2.442
## Median :0.3957 Median :0.6624 Median :0.3370 Median : 3.860
## Mean :0.3969 Mean :0.6509 Mean :0.3473 Mean : 5.325
## 3rd Qu.:0.5171 3rd Qu.:0.8303 3rd Qu.:0.5080 3rd Qu.: 6.366
## Max. :0.7099 Max. :1.0000 Max. :0.9545 Max. :106.960
## Tourism_Area IDHM_Educacao GVA_AGROPEC GVA_INDUSTRY
## Min. :0.0000 Min. :0.0000 Min. : 0 Min. : 0
## 1st Qu.:0.0000 1st Qu.:0.4900 1st Qu.: 0 1st Qu.: 0
## Median :0.0000 Median :0.5600 Median : 6127 Median : 2434
## Mean :0.4384 Mean :0.5583 Mean : 22996 Mean : 110871
## 3rd Qu.:1.0000 3rd Qu.:0.6310 3rd Qu.: 28262 3rd Qu.: 16754
## Max. :1.0000 Max. :0.8250 Max. :655505 Max. :15043915
## GVA_SERVICES GVA_PUBLIC COMP_A COMP_B
## Min. : 0 Min. : 0 Min. : 0.000 Min. : 0.0000
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0.000 1st Qu.: 0.0000
## Median : 12535 Median : 19382 Median : 0.000 Median : 0.0000
## Mean : 269933 Mean : 78376 Mean : 9.182 Mean : 0.8012
## 3rd Qu.: 56919 3rd Qu.: 48201 3rd Qu.: 0.000 3rd Qu.: 0.0000
## Max. :53213122 Max. :10664797 Max. :1948.000 Max. :139.0000
## COMP_C COMP_D COMP_E COMP_F
## Min. : 0.00 Min. : 0.0000 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.0000 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 0.00 Median : 0.0000 Median : 0.00 Median : 0.00
## Mean : 44.07 Mean : 0.1968 Mean : 1.15 Mean : 24.25
## 3rd Qu.: 2.00 3rd Qu.: 0.0000 3rd Qu.: 0.00 3rd Qu.: 1.00
## Max. :6025.00 Max. :143.0000 Max. :163.00 Max. :6373.00
## COMP_G COMP_H COMP_I COMP_J
## Min. : 0.0 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.0 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 0.0 Median : 0.00 Median : 0.00 Median : 0.00
## Mean : 177.7 Mean : 22.74 Mean : 31.12 Mean : 11.38
## 3rd Qu.: 22.0 3rd Qu.: 2.00 3rd Qu.: 2.00 3rd Qu.: 0.00
## Max. :33566.0 Max. :3873.00 Max. :6514.00 Max. :4535.00
## COMP_K COMP_L COMP_M COMP_N
## Min. : 0.000 Min. : 0.000 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 0.000 Median : 0.000 Median : 0.00 Median : 0.00
## Mean : 7.494 Mean : 8.525 Mean : 26.47 Mean : 45.57
## 3rd Qu.: 0.000 3rd Qu.: 0.000 3rd Qu.: 1.00 3rd Qu.: 1.00
## Max. :3501.000 Max. :2785.000 Max. :11925.00 Max. :17752.00
## COMP_O COMP_P COMP_Q COMP_R
## Min. : 0.0000 Min. : 0.00 Min. : 0.00 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.000
## Median : 0.0000 Median : 0.00 Median : 0.00 Median : 0.000
## Mean : 0.9952 Mean : 14.96 Mean : 17.38 Mean : 6.385
## 3rd Qu.: 1.0000 3rd Qu.: 1.00 3rd Qu.: 0.00 3rd Qu.: 0.000
## Max. :120.0000 Max. :3325.00 Max. :4642.00 Max. :1436.000
## COMP_S COMP_U HOTELS Pr_Agencies
## Min. : 0.00 Min. :0.000000 Min. : 0.0000 Min. : 0.000
## 1st Qu.: 0.00 1st Qu.:0.000000 1st Qu.: 0.0000 1st Qu.: 0.000
## Median : 0.00 Median :0.000000 Median : 0.0000 Median : 0.000
## Mean : 24.29 Mean :0.006639 Mean : 0.2144 Mean : 1.012
## 3rd Qu.: 2.00 3rd Qu.:0.000000 3rd Qu.: 0.0000 3rd Qu.: 0.000
## Max. :5327.00 Max. :8.000000 Max. :46.0000 Max. :273.000
## Pu_Agencies Intercept Brazilian_Percentage.1
## Min. : 0.0000 Min. :-18556 Min. :-27412
## 1st Qu.: 0.0000 1st Qu.: -3469 1st Qu.:-24407
## Median : 0.0000 Median : -1198 Median :-13465
## Mean : 0.7886 Mean : -1656 Mean :-11602
## 3rd Qu.: 0.0000 3rd Qu.: 3363 3rd Qu.: 2239
## Max. :168.0000 Max. : 5536 Max. : 4183
## Working_Percentage.1 Tourism_Area.1 IDHM_Educacao.1 GVA_AGROPEC.1
## Min. :-28760.6 Min. :-1908 Min. : -955.8 Min. :0.04547
## 1st Qu.:-14797.5 1st Qu.: 3879 1st Qu.:11444.0 1st Qu.:0.06727
## Median : -997.1 Median : 4852 Median :52685.6 Median :0.08260
## Mean : -5547.4 Mean : 4743 Mean :43274.5 Mean :0.08813
## 3rd Qu.: 3014.3 3rd Qu.: 5360 3rd Qu.:69595.3 3rd Qu.:0.09708
## Max. : 6797.9 Max. : 9486 Max. :83829.8 Max. :0.36826
## COMP_D.1 COMP_E.1 COMP_O.1 COMP_U.1
## Min. :-4454.9 Min. :-14207.4 Min. :-593.1 Min. :-933662.5
## 1st Qu.:-1524.4 1st Qu.: -1021.5 1st Qu.: 172.4 1st Qu.: -16688.8
## Median : -945.0 Median : -885.5 Median :1465.8 Median : -8403.1
## Mean : -953.9 Mean : -934.3 Mean :1132.3 Mean : -15114.2
## 3rd Qu.: -727.8 3rd Qu.: -699.0 3rd Qu.:1828.2 3rd Qu.: -145.2
## Max. :49994.2 Max. : -369.1 Max. :5630.6 Max. : 5858.1
## GVA_INDUSTRY.1 y yhat residual
## Min. :0.01273 Min. : 0 Min. :-13884 Min. :-64005
## 1st Qu.:0.01353 1st Qu.: 0 1st Qu.: 6092 1st Qu.: -5671
## Median :0.01515 Median : 10474 Median : 11826 Median : -1664
## Mean :0.01830 Mean : 15663 Mean : 15862 Mean : -199
## 3rd Qu.:0.02220 3rd Qu.: 21967 3rd Qu.: 23775 3rd Qu.: 2295
## Max. :0.25051 Max. :314638 Max. :226947 Max. :257425
## CV_Score Stud_residual Intercept_SE Brazilian_Percentage_SE
## Min. :0 Min. :-5.5095 Min. : 6615 Min. : 6876
## 1st Qu.:0 1st Qu.:-0.3866 1st Qu.: 7098 1st Qu.: 7672
## Median :0 Median :-0.1135 Median : 9297 Median : 9406
## Mean :0 Mean :-0.0128 Mean : 9816 Mean : 10115
## 3rd Qu.:0 3rd Qu.: 0.1568 3rd Qu.: 11043 3rd Qu.: 11109
## Max. :0 Max. :17.5090 Max. :232748 Max. :232774
## Working_Percentage_SE Tourism_Area_SE IDHM_Educacao_SE GVA_AGROPEC_SE
## Min. : 2381 Min. : 610.0 Min. : 4482 Min. :0.006165
## 1st Qu.: 2661 1st Qu.: 646.9 1st Qu.: 4868 1st Qu.:0.006752
## Median : 3070 Median : 713.7 Median : 5539 Median :0.007455
## Mean : 3434 Mean : 917.9 Mean : 6032 Mean :0.010913
## 3rd Qu.: 3565 3rd Qu.: 825.4 3rd Qu.: 6505 3rd Qu.:0.009276
## Max. :22958 Max. :13650.5 Max. :27161 Max. :0.234310
## COMP_D_SE COMP_E_SE COMP_O_SE COMP_U_SE
## Min. : 151.8 Min. : 69.70 Min. : 116.4 Min. : 2057
## 1st Qu.: 168.7 1st Qu.: 74.07 1st Qu.: 157.8 1st Qu.: 2275
## Median : 219.5 Median : 87.97 Median : 181.7 Median : 3074
## Mean : 456.4 Mean : 171.10 Mean : 215.3 Mean : 6091
## 3rd Qu.: 473.5 3rd Qu.: 204.09 3rd Qu.: 217.8 3rd Qu.: 3777
## Max. :37482.2 Max. :9283.67 Max. :2346.4 Max. :661768
## GVA_INDUSTRY_SE Intercept_TV Brazilian_Percentage_TV
## Min. :0.0005030 Min. :-1.60454 Min. :-3.6730
## 1st Qu.:0.0005252 1st Qu.:-0.35136 1st Qu.:-3.1320
## Median :0.0006063 Median :-0.09964 Median :-1.3389
## Mean :0.0014974 Mean :-0.10081 Mean :-1.4713
## 3rd Qu.:0.0013284 3rd Qu.: 0.49017 3rd Qu.: 0.2010
## Max. :0.1672912 Max. : 0.77461 Max. : 0.3153
## Working_Percentage_TV Tourism_Area_TV IDHM_Educacao_TV GVA_AGROPEC_TV
## Min. :-9.2812 Min. :-0.1406 Min. :-0.04887 Min. : 1.572
## 1st Qu.:-6.1056 1st Qu.: 4.5437 1st Qu.: 1.57713 1st Qu.: 6.098
## Median :-0.2446 Median : 7.0468 Median :10.45565 Median :11.580
## Mean :-2.1700 Mean : 6.3630 Mean : 8.10932 Mean :10.424
## 3rd Qu.: 0.9186 3rd Qu.: 7.8857 3rd Qu.:13.01190 3rd Qu.:13.607
## Max. : 1.6866 Max. : 9.3216 Max. :14.74981 Max. :17.540
## COMP_D_TV COMP_E_TV COMP_O_TV COMP_U_TV
## Min. :-6.750 Min. :-15.859 Min. :-3.1084 Min. :-12.04307
## 1st Qu.:-5.606 1st Qu.:-13.447 1st Qu.: 0.9371 1st Qu.: -4.88502
## Median :-4.552 Median : -9.822 Median : 7.1621 Median : -3.60810
## Mean :-3.959 Mean : -8.928 Mean : 5.9183 Mean : -3.31230
## 3rd Qu.:-2.585 3rd Qu.: -2.938 3rd Qu.:10.6317 3rd Qu.: -0.02712
## Max. : 2.085 Max. : -1.428 Max. :13.3911 Max. : 1.63595
## GVA_INDUSTRY_TV Local_R2 coords.x1 coords.x2
## Min. : 1.433 Min. :0.3539 Min. :1671725 Min. : 6039171
## 1st Qu.:18.541 1st Qu.:0.3772 1st Qu.:4124229 1st Qu.: 7405126
## Median :23.823 Median :0.4623 Median :4607998 Median : 7966247
## Mean :21.891 Mean :0.5264 Mean :4640696 Mean : 8135780
## 3rd Qu.:26.537 3rd Qu.:0.6844 3rd Qu.:5177379 3rd Qu.: 9058340
## Max. :29.505 Max. :0.9917 Max. :6175358 Max. :10507274
## geometry
## POINT :5573
## epsg:5641 : 0
## +proj=merc...: 0
##
##
##
summary(gwr.fixed$SDF$yhat)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -13884 6092 11826 15862 23775 226947
mun_brazil_cities.sf.fixed <- st_join(mun_sirgas2000, brazil_cities.sf.fixed)
The code chunks below is used to create an interactive polygon symbol map.
tm_shape(mun_brazil_cities.sf.fixed) +
tm_polygons(col = "Local_R2",
#size = 0.15,
border.col = "gray60",
popup.vars = c("City_State","Local_R2"),
border.lwd = 1)
The maximum computed R2 was 0.9917 while the lowest was 0.3539. This shows that based on what we have done, our calculation is able to account for 99.17% of some municipality whereas some municipality we are only able to account for 35.39% of it. As we can see above, those that we can account for are mostly on the lower GDP municipality whereas for those higher GDP the Local_R2 is much smaller. This means that we could have missed more variables that affects the higher GDP municipality. We are only able to pick variables that results in the lower GDP municipality.
The code chunks below is used to create an interactive polygon symbol map for Brazilian_Percentage, Working_Percentage, Tourism_Area, IDHM_Educacao, GVA_AGROPEC, COMP_D, COMP_E, COMP_O, COMP_U and GVA_INDUSTRY.
Brazilian_Percentage <- tm_shape(mun_brazil_cities.sf.fixed) +
tm_polygons(col = "Brazilian_Percentage",
#size = 0.15,
border.col = "gray60",
border.lwd = 1) +
tm_view(set.zoom.limits = c(11,14))
Working_Percentage <- tm_shape(mun_brazil_cities.sf.fixed) +
tm_polygons(col = "Working_Percentage",
#size = 0.15,
border.col = "gray60",
border.lwd = 1) +
tm_view(set.zoom.limits = c(11,14))
Tourism_Area <- tm_shape(mun_brazil_cities.sf.fixed) +
tm_polygons(col = "Tourism_Area",
#size = 0.15,
border.col = "gray60",
border.lwd = 1) +
tm_view(set.zoom.limits = c(11,14))
IDHM_Educacao <- tm_shape(mun_brazil_cities.sf.fixed) +
tm_polygons(col = "IDHM_Educacao",
#size = 0.15,
border.col = "gray60",
border.lwd = 1) +
tm_view(set.zoom.limits = c(11,14))
GVA_AGROPEC <- tm_shape(mun_brazil_cities.sf.fixed) +
tm_polygons(col = "GVA_AGROPEC",
#size = 0.15,
border.col = "gray60",
border.lwd = 1) +
tm_view(set.zoom.limits = c(11,14))
COMP_D <- tm_shape(mun_brazil_cities.sf.fixed) +
tm_polygons(col = "COMP_D",
#size = 0.15,
border.col = "gray60",
border.lwd = 1) +
tm_view(set.zoom.limits = c(11,14))
COMP_O <- tm_shape(mun_brazil_cities.sf.fixed) +
tm_polygons(col = "COMP_O",
#size = 0.15,
border.col = "gray60",
border.lwd = 1) +
tm_view(set.zoom.limits = c(11,14))
COMP_E <- tm_shape(mun_brazil_cities.sf.fixed) +
tm_polygons(col = "COMP_E",
#size = 0.15,
border.col = "gray60",
border.lwd = 1) +
tm_view(set.zoom.limits = c(11,14))
COMP_U <- tm_shape(mun_brazil_cities.sf.fixed) +
tm_polygons(col = "COMP_U",
#size = 0.15,
border.col = "gray60",
border.lwd = 1) +
tm_view(set.zoom.limits = c(11,14))
GVA_INDUSTRY <- tm_shape(mun_brazil_cities.sf.fixed) +
tm_polygons(col = "GVA_INDUSTRY",
#size = 0.15,
border.col = "gray60",
border.lwd = 1) +
tm_view(set.zoom.limits = c(11,14))
tmap_arrange(Brazilian_Percentage, Working_Percentage, Tourism_Area, IDHM_Educacao, GVA_AGROPEC, COMP_D, COMP_E, COMP_O, COMP_U, GVA_INDUSTRY, ncol=2)
Based on the choropleth maps above, we are able to see that Brazilian_Percentage, Working_Percentage and IDHM_Educacao has the biggest contrast where we are able to see those areas with a higher GDP Per Capita are usually low in the number of Brazilian People which means Foreigners are largely populated around the municipality. For working percentage, those with higher GDP Per Capita usually have more working population within the municipality. Lastly for IDHM_Educacao, we are able to see that those with a higher education index are also in the higher GDP Per Capita municipality. As for the other variables, we are able to see a slight difference from tourism area where those with higher GDP are those with higher tourism area which means tourist places. For the rest of the variables, we can see minor difference between the different municipality. Hence, from this, I am able to conclude that those variables that affect the GDP Per Capita are the number of Brazilian’s, the number of working adults, education level and whether it is a tourist destination within in the municipality. However, as I have noted, there are more variables that could be added to enhance the accuracy of the model. However, due to time constraints, I will be rounding off the report here.