Introduction

In this project, aimed at studying commuting flows in Wallonia, I will try to analyze and study their causes using various methods of statistics and econometrics. I will try to analyze how the commuting flows are interconnected with socio-demographic indicators such as: the unemployment rate, the average income, the population number, the population density, the number of non-indigenous people, the development of road infrastructure and traffic on the roads within the communes. The ultimate goal of the work is to identify the links between improving the road infrastructure, reducing travel time and commuting flows, and finally to use these factors to manage these flows through the development of road infrastructure and the creation of additional jobs in the region for its further development.

Data Collection

All the data about commuting flows was provided by course instructors, additional data about road networks and traffic on roads was downloaded from web site of L’Institut wallon de l’evaluation, de la prospective et de la statistique (IWEPS).

Methodology

In this project, I made a model that explains the outflow of population by an increase in the level of unemployment, respectively, the greater the unemployment rate within the commune, the greater the outflow of people from it and the number of commuting flows increases. I also added to the model a variable such as traffic on roads and a regional road network, trying to explain the commuting flows of high traffic on the roads with their busyness and availability, which is a factor in the mobility of the population.

According to the IWEPS survey 77% of trips of Walloon households are made by road: 71% by car, 5% by bus and 1% by motorcycle. In terms of travel to school and work, the car is also the most used mode, since more than half of trips are made mainly by this mode of transport (66% in Wallonia).

The BELDAM survey also shows that the mean average distance per day of a trip in Walloon is 13.4 km. Looking at the contribution of each of the different modes to the distances traveled through the latest surveys, the predominance of the car is reinforced by comparison with the MOBEL survey of 1999. The contribution of public transport is also reinforced, mainly through the use of the train.

The number of population and population density was added to the model, the increase of these variables should also lead to an increase in commuting flows, namely outflow, because large cities require additional labor, which comes from neighboring communes. The last independent variable is the number of non-indigenous people whose increase should lead to a decrease in the outflow of population, as usual they are coming to other places because of the economic factor with higher number of different jobs.

At the beginning of the project, maps showing the main flows in Wallonia will be created, the first will show the main trends, the largest number of flows and their direction, the second will show all other directions of the flows. After that, will be created map showing the flows to the largest cities in Wallonia, visualizing their relationship with neighboring communes. Poisson’s regression follows. Poisson regression is made to discover whether the origin-destination flow counts are significant and if they can be explained by time, unemployment rate, and distance. The Poisson model is expected to return significant results and Wallonia flows should be explained by our independent variables.

After visualizing and counting the flows, we resort to statistical and econometric methods to understand whether our dependent variable outflow can be explained by independent variables. Within the framework of a descriptive analysis, we will produce maps showing our variables. We also use the methods of spatial autocorrelation of our regression to identify the determinants of a spatial process, to understand are estimation results are biased in presence of spatial dependence and because of need for econometric techniques that account for spatial spill-over effects on the dependent variables or omitted from the model specification. In OLS we will test our regressions results for spatial autocorrelation by Moran’s I and LISA test. Possibly our OLS model will be unable to account for the spatial dependence of both dependent and independent variables, we will test our final OLS iterative for spatial error and lag dependence.

For testing spatial dependence we will use Lagrange Multiplier test. Test provides five statistics for testing for spatial dependence in linear models: 1) simple LM test for error dependence [LMerr], 2) simple LM test for a missing spatially lagged dependent variable [LMlag], 3) tests for error dependence in the possible presence of a missing lagged dependent variable [RLMerr], 4) test for lag dependence in the possible presence of a spatially autocorrelated error term [RLMlag], 5) portmanteau test [SARMA]. After this series of tests, we will test our residuals for heteroscedasticity and spatial autocorrelation.

Following test for spatial dependence, we do Geographically Weighted Regression, by fitting our model, it allows for the identification of spatial non-stationarity in the relationship between geographical phenomenon and its determinants. The theory behind this method is based on Tobler’s first law of Geography: “Everything is related to everything else, but near things are more related than distant things.” The selection process in GWR is performed via Akaike Information Criterion (AIC). AIC is relative, so the value of the “best” model will be compared to both OLS and SLM output to determine what is more effective in explaining the variance of our dependent variable. GWR results will then be tested for non-stationarity and multi-collinearity.

Results of all tests and methods with comments are presented below.

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.3.3

library(maptools)

## Warning: package 'maptools' was built under R version 3.3.3

## Loading required package: sp

## Warning: package 'sp' was built under R version 3.3.3

## Checking rgeos availability: TRUE

library(rgeos)

## Warning: package 'rgeos' was built under R version 3.3.3

## rgeos version: 0.3-23, (SVN revision 546)
##  GEOS runtime version: 3.5.0-CAPI-1.9.0 r4084 
##  Linking to sp version: 1.2-4 
##  Polygon checking: TRUE

library(rgdal)

## Warning: package 'rgdal' was built under R version 3.3.3

## rgdal: version: 1.2-7, (SVN revision 660)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.0.1, released 2015/09/15
##  Path to GDAL shared files: C:/Users/Boris/Documents/R/win-library/3.3/rgdal/gdal
##  Loaded PROJ.4 runtime: Rel. 4.9.2, 08 September 2015, [PJ_VERSION: 492]
##  Path to PROJ.4 shared files: C:/Users/Boris/Documents/R/win-library/3.3/rgdal/proj
##  Linking to sp version: 1.2-4

library(sp)
library(spdep)

## Loading required package: Matrix

library(flows)

## Warning: package 'flows' was built under R version 3.3.3

library(gflowmap)

## Loading required package: Hmisc

## Warning: package 'Hmisc' was built under R version 3.3.3

## Loading required package: lattice

## Loading required package: survival

## Warning: package 'survival' was built under R version 3.3.3

## Loading required package: Formula

## 
## Attaching package: 'Hmisc'

## The following object is masked from 'package:rgeos':
## 
##     translate

## The following object is masked from 'package:maptools':
## 
##     label

## The following objects are masked from 'package:base':
## 
##     format.pval, round.POSIXt, trunc.POSIXt, units

## Loading required package: labeling

## Warning: replacing previous import 'Hmisc::translate' by 'rgeos::translate'
## when loading 'gflowmap'

## Warning: replacing previous import 'Hmisc::label' by 'maptools::label' when
## loading 'gflowmap'

library(pbkrtest)

## Warning: package 'pbkrtest' was built under R version 3.3.3

## Loading required package: lme4

## Warning: package 'lme4' was built under R version 3.3.3

library(car)

## Warning: package 'car' was built under R version 3.3.3

library(RColorBrewer)
library(lmtest)

## Warning: package 'lmtest' was built under R version 3.3.3

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

library(labeling)
library(Hmisc)
library(MASS)

## Warning: package 'MASS' was built under R version 3.3.3

library(raster)

## Warning: package 'raster' was built under R version 3.3.3

## 
## Attaching package: 'raster'

## The following objects are masked from 'package:MASS':
## 
##     area, select

## The following objects are masked from 'package:Hmisc':
## 
##     mask, zoom

library(GISTools)

## Warning: package 'GISTools' was built under R version 3.3.3

library(classInt)

## Warning: package 'classInt' was built under R version 3.3.3

wd <- setwd("/Users/Boris/Desktop/Project stastics/")
##loadinging the data table
data_communes_wallonie <- read.csv("/Users/Boris/Desktop/Project stastics/Wallonia_data1.csv", sep=";")
Walsdf <- readOGR(dsn = wd, "Walcom")

## OGR data source with driver: ESRI Shapefile 
## Source: "C:/Users/Boris/Desktop/Project stastics", layer: "Walcom"
## with 262 features
## It has 9 fields
## Integer64 fields read as strings:  IDCOM NEWFIELD1 ARR

##projecting shapefile
proj4string(Walsdf)<-CRS("+init=epsg:31370")

## Warning in `proj4string<-`(`*tmp*`, value = <S4 object of class structure("CRS", package = "sp")>): A new CRS was assigned to an object with an existing CRS:
## +proj=longlat +ellps=intl +no_defs
## without reprojecting.
## For reprojection, use function spTransform

##loading floes data
test <- read.csv("/Users/Boris/Desktop/Project stastics/Wallonie_OD_commuting/2014.csv", sep=";")
##cleaning table
test$annee <- NULL
##creating flows matrix
INSCODEDF <- data.frame(data_communes_wallonie$INSCODE)
colnames(INSCODEDF)[1] <- "INSCODE"
##creating vector of INSCODE
INSVEC <- INSCODEDF[ , "INSCODE"]
##creating flows data frame 
atest <- test[test$zonout %in% INSVEC,]
btest <- atest[atest$zonin %in% INSVEC,]
##transform to the numeric 
btest <- transform(btest, class=as.numeric(as.character(btest$mouv)))
btest$mouv <- NULL
btestflows <- prepflows(btest, i = "zonout", j = "zonin", fij = "class") 
##cleaning flows inside communes
diag(btestflows) <- 0

##summarize all in and out flows for each commune 
totaloutflows14 <- as.data.frame(rowSums(btestflows, na.rm = TRUE, dims = 1))
totalinflow14 <- as.data.frame(colSums(btestflows, na.rm = TRUE, dims = 1))
##enter the sequence number and rowname INSCODE
names <- rownames(btestflows)
rownames(btestflows) <- NULL
btestflowsgflow <- cbind(INSCODEDF,btestflows)
## add INSCODE colomn in our table of all in and out flows
totalinflow14$INSCODE <- INSCODEDF$INSCODE
totaloutflows14$INSCODE <- INSCODEDF$INSCODE
##merging flows data with our sdf 
testmerge <- merge(data_communes_wallonie, totalinflow14, by = "INSCODE")
Wallonie_m <- merge(testmerge, totaloutflows14, by = "INSCODE")
colnames(Wallonie_m)[44] <- "Inflow2014"
colnames(Wallonie_m)[45] <- "Outflow2014"
attr(Walsdf, "data")$go<-seq_len(nrow(attr(Walsdf, "data")))
m<-merge(attr(Walsdf, "data"),Wallonie_m, by="INSCODE", all.x = TRUE )
attr(Walsdf, "data")<-m[order(m$go),]
##produce a complex Origin-Destination spatial object  to produce a flow map using gflowmap
myOD2<-make.SpatialOD(Walsdf, Wallonie_m, btestflowsgflow,"INSCODE","INSCODE","INSCODE", Diagonal=FALSE)

## [1] "The spatial units of the map you are analysing are:  Polygons"
## [1] "The ID of the flow matrix is not numeric"

flow.map1<-gflows(myOD2, colour="#FF4500",subset="Flows>500",transparency=-0.7, thickness=-1)
flow.map1

##from the flow map with subset of flows more than 500 
##It can be assumed that the most basic flows pass through the E42 and E19 highways that cross ##Wallonia from west to east and connect all the main cities in Wallonia from Liege to Tournai
##subset of flows more than 100
flow.map2<-gflows(myOD2, colour="#32CD32",subset="Flows>100",transparency=-1, thickness=-1)
flow.map2

##on the second map with subset of flow more than 100 appeares highway E25 which stretches from north to south, possibly people use this motorway to go to workplace from Wallonia to Luxembourg

##join the population and flow data to the Spatial Polygon Data Frame
pop_flows<-cbind(Wallonie_m,btestflowsgflow)
pop_flows<-pop_flows[,c(1:26,28:308)]
Walsdflow <- readOGR(dsn = wd, "Walcom")

## OGR data source with driver: ESRI Shapefile 
## Source: "C:/Users/Boris/Desktop/Project stastics", layer: "Walcom"
## with 262 features
## It has 9 fields
## Integer64 fields read as strings:  IDCOM NEWFIELD1 ARR

proj4string(Walsdflow)<-CRS("+init=epsg:31370")

## Warning in `proj4string<-`(`*tmp*`, value = <S4 object of class structure("CRS", package = "sp")>): A new CRS was assigned to an object with an existing CRS:
## +proj=longlat +ellps=intl +no_defs
## without reprojecting.
## For reprojection, use function spTransform

mysdfdata_tmp <- merge(Walsdflow, pop_flows, by.x = "INSCODE", by.y = "INSCODE", sort = FALSE, all = TRUE)

##In order to draw a choropleth map, we have to run the following chunk of code
choro <- function(spdframe, variable) {
  var <- round(spdframe@data[, variable], digits = 2)
  breaks <- classIntervals(var, n = 5, style = "jenks")$brk
  my_colours <- brewer.pal(5, "BuGn")
  plot(spdframe, col = my_colours[findInterval(var, breaks, all.inside = TRUE)],
       axes = FALSE, border = rgb(0.8, 0.8, 0.8))
  invisible(list(b = breaks, c = my_colours))
}
##creating map of main cities with highest inflow rates
##Namur
breaks <- choro(mysdfdata_tmp, "92094")
legend(x = 25000, y = 110000, legend = leglabs(breaks$b), fill = breaks$c, bty = "n")
title("Flows to Namur from neighbourhood communes")
scalebar(50000, xy=c(135000, 25000), type='bar', divs=4, below="Meter")

##Liege
breaks <- choro(mysdfdata_tmp, "62063")
legend(x = 25000, y = 110000, legend = leglabs(breaks$b), fill = breaks$c, bty = "n")
title("Flows to Liege from neighbourhood communes")
scalebar(50000, xy=c(135000, 25000), type='bar', divs=4, below="Meter")

##Charleroi
breaks <- choro(mysdfdata_tmp, "52011")
legend(x = 25000, y = 110000, legend = leglabs(breaks$b), fill = breaks$c, bty = "n")
title("Flows to Charleroi from neighbourhood communes")
scalebar(50000, xy=c(135000, 25000), type='bar', divs=4, below="Meter")

##Mons
breaks <- choro(mysdfdata_tmp, "53053")
legend(x = 25000, y = 110000, legend = leglabs(breaks$b), fill = breaks$c, bty = "n")
title("Flows to Mons from neighbourhood communes")
scalebar(50000, xy=c(135000, 25000), type='bar', divs=4, below="Meter")

Several inflow maps were produced for the major cities in order to identify the major areas of commuting outflow. All of these cities are very attractive for commuting flows because of the activities, which are represented in these cities, and because of their high level of interconnection between each other. It is possible to suppose that traffic on the roads in this communes will be higher, road infrastructure more developed, population and density also will be higher, and high number of inflows and outflows will be higher comparing to the communes situated southern in Wallonia. Higher number of different activities and services attract people to bigger cities.

### poisson regression, trying to explain flows by time, unemployment rate and distance 
## loading our OD data 
WallonieOD <- read.csv("/Users/Boris/Desktop/Project stastics/Wallonie_OD_googlemaps.csv", sep=";")
m<-matrix(data=0, nrow=262*262, ncol=4)
dij<-as.matrix(dist(cbind(Walsdflow$X_COORD,Walsdflow$Y_COORD)))
timem <- prepflows(mat = WallonieOD, i = "INS_OR", j = "INS_DEST", fij = "time_s")
counter<-1
for(i in 1:262) {
  for(j in 1:262){
    
    if(j!=i) {m[counter,1] <- btestflowsgflow[i,j+1] }
    else if (j==i){m[counter,1]<-0}
    
    m[counter,1]<-btestflowsgflow[i,j+1]
    m[counter,2]<-timem[i,j]
    m[counter,3]<-Wallonie_m[j,5]
    m[counter,4]<-dij[i,j]
    counter<-counter+1
  }
}
df<-data.frame(Flow=m[,1],Time=m[,2], Unemployment_6months=m[,3], Dij=m[,4])
df[df == 0] <- NA
Data <- na.omit(df)
un.poisson<-glm(Flow ~ Time + Unemployment_6months + Dij, family=poisson, data=Data)
summary(un.poisson)

## 
## Call:
## glm(formula = Flow ~ Time + Unemployment_6months + Dij, family = poisson, 
##     data = Data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -35.578   -3.472   -1.074    1.138  112.957  
## 
## Coefficients:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           5.219e+00  3.562e-03 1465.14   <2e-16 ***
## Time                 -1.040e-03  4.872e-06 -213.38   <2e-16 ***
## Unemployment_6months  1.726e-05  1.734e-08  995.55   <2e-16 ***
## Dij                  -2.311e-05  2.482e-07  -93.11   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 2136514  on 24538  degrees of freedom
## Residual deviance:  779992  on 24535  degrees of freedom
## AIC: 856456
## 
## Number of Fisher Scoring iterations: 6

A Poisson regression was run to understand whether can be commuting flows be predicted by explanatory variables and are the relationship is linear. Explanatory variables and results are:

Time - significant, have negative sign and decrease flows, increasing time to 1 unit leads decreasing flows number to ~ 0.00104

Unemployment rate - significant, have positive sign, each increase in unemployment rate leads to increasing flows on 0.00001726

Distance - significant, have negative sign, each increase in distance leads to decrease flows number on 0.00002311

This explain high number of flows to the biggest cities in region from neighborhood communes. Also in big cities usually higher number of unemployment this leads to increasing number of flow, for example good specialist in their field which living in one city and can’t find job in this field is forced to going every day to another place with higher jobs rate in his specific field.

##Descriptive part
choro <- function(spdframe, variable) {
  var <- round(spdframe@data[, variable], digits = 2)
  breaks <- classIntervals(var, n = 5, style = "jenks")$brk
  my_colours <- brewer.pal(5, "Reds")
  plot(spdframe, col = my_colours[findInterval(var, breaks, all.inside = TRUE)],
       axes = FALSE, border = rgb(0.8, 0.8, 0.8))
  invisible(list(b = breaks, c = my_colours))
}
breaks <- choro(Walsdf, "Inflow2014")
legend(x = 25000, y = 70000, legend = leglabs(breaks$b), fill = breaks$c, bty = "n")
title("Wallonia Commuting Inflows 2014")
scalebar(50000, xy=c(40000, 95000), type='bar', divs=4, below="Meter")

##plotting the Density map
hist(Walsdf$Density)

WalDen <- spplot(Walsdf, zcol = "Density", col.regions = colorRampPalette(c("#FFFFE0", "#FF0000"))(262))
WalDen <- function(varmap, sdf) {
  spplot(sdf, zcol = varmap, col.regions = colorRampPalette(c("#FFFFE0", "#FF0000"))(nrow(sdf)), 
         main = list(label = varmap, cex = 1))
}
WalDen("Density", Walsdf)

## plotting Road traffic (total) 
hist(Walsdf$Road_traffic_total.)

WalRoadtraffic <- spplot(Walsdf, zcol = "Road_traffic_total.", col.regions = colorRampPalette(c("#8B0000", "#FFFFE0"))(262))
WalRoadtraffic <- function(varmap, sdf) {
  spplot(sdf, zcol = varmap, col.regions = colorRampPalette(c("#FFF5EE", "#FF0000"))(nrow(sdf)), 
         main = list(label = varmap, cex = 1))
}
WalRoadtraffic("Road_traffic_total.", Walsdf)

##Unemplointment rate 6 months 
hist(Walsdf$Unemployment_6months)

WalunEmplointment <- spplot(Walsdf, zcol = "Unemployment_6months", col.regions = colorRampPalette(c("#BBFFFF", "#000080"))(262))
WalunEmplointment <- function(varmap, sdf) {
  spplot(sdf, zcol = varmap, col.regions = colorRampPalette(c("#BBFFFF", "#000080"))(nrow(sdf)), 
         main = list(label = varmap, cex = 1))
}
WalunEmplointment("Unemployment_6months", Walsdf)

##Regional_Networks 
hist(Walsdf$Regional_Networks)

WalNetworks <- spplot(Walsdf, zcol = "Regional_Networks", col.regions = colorRampPalette(c("#BBFFFF", "#000080"))(262))
WalNetworks <- function(varmap, sdf) {
  spplot(sdf, zcol = varmap, col.regions = colorRampPalette(c("#BBFFFF", "#000080"))(nrow(sdf)), 
         main = list(label = varmap, cex = 1))
}
WalNetworks("Regional_Networks", Walsdf)

## plotting map of Income 2013 
hist(Walsdf$Income_2013)

WalIncome <- spplot(Walsdf, zcol = "Income_2013", col.regions = colorRampPalette(c("#BBFFFF", "#000080"))(262))
WalIncome <- function(varmap, sdf) {
  spplot(sdf, zcol = varmap, col.regions = colorRampPalette(c("#BBFFFF", "#000080"))(nrow(sdf)), 
         main = list(label = varmap, cex = 1))
}
WalIncome("Income_2013", Walsdf)

##Non_belges2015
hist(Walsdf$Non_belges2015)

WalNB <- spplot(Walsdf, zcol = "Non_belges2015", col.regions = colorRampPalette(c("#BBFFFF", "#000080"))(262))
WalNB <- function(varmap, sdf) {
  spplot(sdf, zcol = varmap, col.regions = colorRampPalette(c("#BBFFFF", "#000080"))(nrow(sdf)), 
         main = list(label = varmap, cex = 1))
}
WalNB("Non_belges2015", Walsdf)

From the series maps of Density, Income, Road traffic, Unemployment rate, Regional networks and Nonnative people we can make several conclusions.

The higher density of population is in Liege there also the highest rate of inflows.
The highest Income rate of the income in commune which in between the northern border of Wallonia and Brussels capitol region and in southern east border with Luxembourg, this can tell us about some limitation for the commuting studies, I suppose that means that people from this communes cross border of the region, work in one place, but living in Wallonia and bringing bigger salaries rate in the region.
Road traffic and regional networks confirm relationships between five most biggest cities in Wallonia region, also it will be interesting to add in the studies of flows border cities such as Aachen in Germany, Lille in France and northern communes of Luxembourg to analyze how and how much people from Walloon region commuting to another countries and what kind of impact it bringing on Walloon region.

##lm testn of our model 
Model1 <- lm(Outflow2014 ~ Road_traffic_total. + Unemployment_6months + Income_2013 + Regional_Networks + Population_2015 + Non_belges2015 + Density, data = Walsdf)
summary(Model1)

## 
## Call:
## lm(formula = Outflow2014 ~ Road_traffic_total. + Unemployment_6months + 
##     Income_2013 + Regional_Networks + Population_2015 + Non_belges2015 + 
##     Density, data = Walsdf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2966.3  -452.0  -108.6   328.6  2570.2 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           3.365e+03  4.643e+02   7.248 5.09e-12 ***
## Road_traffic_total.   1.863e+00  6.910e-01   2.696  0.00749 ** 
## Unemployment_6months -3.221e+01  1.337e+01  -2.408  0.01675 *  
## Income_2013          -4.110e-02  1.304e-02  -3.153  0.00181 ** 
## Regional_Networks    -1.033e+01  3.361e+00  -3.072  0.00236 ** 
## Population_2015       7.571e-02  6.376e-03  11.874  < 2e-16 ***
## Non_belges2015       -4.191e+01  8.876e+00  -4.722 3.87e-06 ***
## Density               8.750e-01  1.641e-01   5.333 2.13e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 701.9 on 254 degrees of freedom
## Multiple R-squared:  0.8997, Adjusted R-squared:  0.897 
## F-statistic: 325.6 on 7 and 254 DF,  p-value: < 2.2e-16

##anova
anova(Model1)

## Analysis of Variance Table
## 
## Response: Outflow2014
##                       Df    Sum Sq   Mean Sq  F value    Pr(>F)    
## Road_traffic_total.    1 829410640 829410640 1683.465 < 2.2e-16 ***
## Unemployment_6months   1  88801978  88801978  180.242 < 2.2e-16 ***
## Income_2013            1  14698879  14698879   29.834 1.121e-07 ***
## Regional_Networks      1  36772690  36772690   74.638 6.399e-16 ***
## Population_2015        1 133359868 133359868  270.682 < 2.2e-16 ***
## Non_belges2015         1   6025716   6025716   12.230 0.0005548 ***
## Density                1  14013038  14013038   28.442 2.135e-07 ***
## Residuals            254 125140914    492681                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(Model1)

##check the existence of heteroscedasticity
bptest(Model1)

## 
##  studentized Breusch-Pagan test
## 
## data:  Model1
## BP = 61.533, df = 7, p-value = 7.454e-11

##hetereoscedasticity cannot be rejected
##log transform of the dependent variable
Model2 <- lm(log(Outflow2014) ~ Road_traffic_total. + Unemployment_6months + Income_2013 + Regional_Networks + Population_2015 + Non_belges2015 + Density, data = Walsdf)
summary(Model2)

## 
## Call:
## lm(formula = log(Outflow2014) ~ Road_traffic_total. + Unemployment_6months + 
##     Income_2013 + Regional_Networks + Population_2015 + Non_belges2015 + 
##     Density, data = Walsdf)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.57597 -0.29974  0.03831  0.39031  1.12045 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           8.559e+00  3.395e-01  25.215  < 2e-16 ***
## Road_traffic_total.   1.945e-03  5.052e-04   3.851 0.000149 ***
## Unemployment_6months -4.686e-02  9.778e-03  -4.793 2.80e-06 ***
## Income_2013          -2.228e-06  9.531e-06  -0.234 0.815309    
## Regional_Networks    -2.360e-03  2.458e-03  -0.960 0.337724    
## Population_2015      -1.281e-07  4.662e-06  -0.027 0.978094    
## Non_belges2015       -1.271e-02  6.489e-03  -1.958 0.051314 .  
## Density               6.538e-04  1.199e-04   5.450 1.19e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5132 on 254 degrees of freedom
## Multiple R-squared:  0.5763, Adjusted R-squared:  0.5647 
## F-statistic: 49.36 on 7 and 254 DF,  p-value: < 2.2e-16

plot(Model2)

bptest(Model2)

## 
##  studentized Breusch-Pagan test
## 
## data:  Model2
## BP = 32.361, df = 7, p-value = 3.481e-05

After testing Model1, not all variables showed expected sign, but all are significant:

Each increase in Road traffic cause increase of outflows, this may be logical because traffic appears mostly in rush hours in the morning when people going to work and in the evening when they go back from work, creating outflow number.
Each increase in Unemployment rate cause decrease of outflows and this is not expected, I supposed that the higher unemployment rate inside the commune the more people will commuting for job.
Each increase in income decrease number of outflows and this was expected. The higher income inside commune less you move to other what creates outflows, maybe it will be interesting in other project to run this model for inflows, to see will high income in commune increase number of inflows.
Negative sign also was unexpected for Regional Networks so each increase of road networks leads to decreasing the outflows, maybe as with income case It will be interesting to add this variable to check relationships between roads networks and inflows. Maybe the more developed the regional network the more inflows you have.
With the case of Population variable it was expected, the more population you have the more they move and create outflows and inflows.
For non native people we have have negative sign, the more non-belges you have in commune the less outflows you have, and It was expected because, as it was said they try to settle in the areas with big range of jobs and higher income, maybe also it will be interesting to add this variable to check the inflows, will it change the sign to positive in this case.
Density is the same like in population variable case, the higher density the more outflows you have.

But steel we have high value of R-square: 0.89 what is telling us that nearly 90% of outflows can be explained by this model, or.something is missing in it!

After log transformation of the dependent variable we faced to more unexpected results only 3 our independent variables are still significant on the high level: 1) Road traffic, 2) Unemployment rate, 3) Density, less significant rate of nonnative people. As a result, our R-squared fell down from 89% to 57%

Following this we are doing tests for spacial autocorrelation in regression results

For testing spatial autocorrelation, we need first defining neighbors and spatial weights.

Spatial weights and spatial autocorrelation analysis tools are found in the spatial dependence package: spdep

Creating a spatial weight matrix is a two stage process: defining a neighbors list object (nb) then converting it into a weights list object (using nb2listw function).

The nb objects is a list containing for each record a vector of the index of its neighbors. It can for example be based on distance-based information or on topological information retrieved from the original shapefile.

Let’s use contiguity information first and then a set of distance-based neighbors using the k nearest neighbors from polygon centroid.

Outflowsmap <- function(varmap, sdf) {
  spplot(sdf, zcol = varmap, col.regions = colorRampPalette(c("grey95", "black"))(nrow(sdf)), 
         main = list(label = varmap, cex = 1))
}

Outflowsmap("Outflow2014", Walsdf)

Walsdf$resid.Model2 <- resid(Model2)
##subset, one commune dont have neighbours 
Walsdf3 <- subset(Walsdf, Walsdf$NOM1!="COMINES")
Outflowsmap("resid.Model2", Walsdf3)

Outflowsmap_B2R <- function(varmap, sdf) {
  spplot(sdf, zcol = varmap, col.regions = colorRampPalette(c("dodgerblue4", 
                                                              "snow1", "tomato3"))(nrow(sdf)), main = list(label = varmap, cex = 1))
}

Outflowsmap_B2R("resid.Model2", Walsdf3)

nb_Wal_contig <- poly2nb(Walsdf3)
summary(nb_Wal_contig)

## Neighbour list object:
## Number of regions: 261 
## Number of nonzero links: 1392 
## Percentage nonzero weights: 2.043423 
## Average number of links: 5.333333 
## Link number distribution:
## 
##  1  2  3  4  5  6  7  8  9 10 11 
##  1  7 31 36 70 59 34 13  7  2  1 
## 1 least connected region:
## 80 with 1 link
## 1 most connected region:
## 125 with 11 links

# Neighbours list created k nearest neighbours
coords_Wal <- coordinates(Walsdf3)
knn_Wal4 <- knearneigh(coords_Wal, k = 4)
nb_Wal_k4 <- knn2nb(knn_Wal4)
knn_Wal10 <- knearneigh(coords_Wal, k = 10)
nb_Wal_k10 <- knn2nb(knn_Wal10)

par(mfrow = c(1, 3))  #places next 3 plots in a row

plot(Walsdf3, border = "grey")
title(main = "Neighbours list 1st order contiguity")
plot(nb_Wal_contig, coords_Wal, col = "green", add = TRUE)

plot(Walsdf3, border = "grey")
title(main = "Neighbours list knn=4")
plot(nb_Wal_k4, coords_Wal, add = TRUE)

plot(Walsdf3, border = "grey")
title(main = "Neighbours list knn=10")
plot(nb_Wal_k10, coords_Wal, col = "red", add = TRUE)

nb2listw_contig <- nb2listw(nb_Wal_contig, zero.policy=TRUE)
nb2listw_k4 <- nb2listw(nb_Wal_k4)
nb2listw_k10 <- nb2listw(nb_Wal_k10)

moran.test(Walsdf3$resid.Model2, listw = nb2listw_contig, zero.policy = TRUE)

## 
##  Moran I test under randomisation
## 
## data:  Walsdf3$resid.Model2  
## weights: nb2listw_contig  
## 
## Moran I statistic standard deviate = 6.0974, p-value = 5.391e-10
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.232894952      -0.003846154       0.001507512

moran.test(Walsdf3$resid.Model2, listw = nb2listw_k4)

## 
##  Moran I test under randomisation
## 
## data:  Walsdf3$resid.Model2  
## weights: nb2listw_k4  
## 
## Moran I statistic standard deviate = 7.1209, p-value = 5.36e-13
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.290211584      -0.003846154       0.001705262

moran.test(Walsdf3$resid.Model2, listw = nb2listw_k10)

## 
##  Moran I test under randomisation
## 
## data:  Walsdf3$resid.Model2  
## weights: nb2listw_k10  
## 
## Moran I statistic standard deviate = 7.755, p-value = 4.416e-15
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      0.1979649227     -0.0038461538      0.0006772072

par(mfrow = c(1, 1))  #Back to one single plot display
moran.plot(Walsdf3$resid.Model2, listw = nb2listw_k4)

moranscatterplot <- function(Attr, listw, AttrName) {
  zAttr <- (Attr - mean(Attr))/sd(Attr)  #Standardise
  wzAttr <- lag.listw(listw, zAttr)  #Lag the variable
  morlm <- lm(wzAttr ~ zAttr)  #run OLS
  aa <- morlm$coefficients[1]  # to obtain intercept
  mori <- morlm$coefficients[2]  # and slope, i.e. Moran's I
  plot(zAttr, wzAttr, xlab = AttrName, ylab = paste("spatial lag of", AttrName))
  abline(aa, mori, col = 2)
  abline(h = 0, lty = 2, col = 4)
  abline(v = 0, lty = 2, col = 4)
  title(main = paste("Moran's scatterplot of ", AttrName, "( I=", round(mori, 
                                                                        4), ")"))
}

moranscatterplot(Walsdf3$resid.Model2, nb2listw_k4, "Residuals of Model 2")

##Local Index of Spatial Association
lisa <- localmoran(Walsdf3$resid.Model2, nb2listw_k4)
head(lisa)

##             Ii         E.Ii    Var.Ii        Z.Ii Pr(z > 0)
## 1 -0.147706190 -0.003846154 0.2450109 -0.29063471 0.6143346
## 2 -0.192200478 -0.003846154 0.2450109 -0.38052476 0.6482220
## 3  0.499275204 -0.003846154 0.2450109  1.01643609 0.1547109
## 4  0.006085643 -0.003846154 0.2450109  0.02006481 0.4919958
## 5  0.245326297 -0.003846154 0.2450109  0.50339320 0.3073439
## 6  0.236077693 -0.003846154 0.2450109  0.48470861 0.3139415

summary(lisa)

##        Ii                E.Ii               Var.Ii           Z.Ii         
##  Min.   :-0.83193   Min.   :-0.003846   Min.   :0.245   Min.   :-1.67294  
##  1st Qu.:-0.04383   1st Qu.:-0.003846   1st Qu.:0.245   1st Qu.:-0.08077  
##  Median : 0.14377   Median :-0.003846   Median :0.245   Median : 0.29823  
##  Mean   : 0.29021   Mean   :-0.003846   Mean   :0.245   Mean   : 0.59407  
##  3rd Qu.: 0.51337   3rd Qu.:-0.003846   3rd Qu.:0.245   3rd Qu.: 1.04492  
##  Max.   : 2.89577   Max.   :-0.003846   Max.   :0.245   Max.   : 5.85798  
##    Pr(z > 0)     
##  Min.   :0.0000  
##  1st Qu.:0.1480  
##  Median :0.3828  
##  Mean   :0.3630  
##  3rd Qu.:0.5322  
##  Max.   :0.9528

maplisa <- function(sdf, Attr, spid, listw) {
  lisa <- localmoran(Attr, listw)
  # Identify quadrant and significance
  zAttr <- (Attr - mean(Attr))/sd(Attr)  #Standardise
  wzAttr <- lag.listw(listw, zAttr)  #Standardize lagged variable
  LH <- cut(zAttr, breaks = c(min(zAttr), 0, max(zAttr)), include.lowest = TRUE, 
            labels = c("L", "H"))
  LHlag <- cut(wzAttr, breaks = c(min(wzAttr), 0, max(wzAttr)), include.lowest = TRUE, 
               labels = c("L", "H"))
  
  # Interact (L H) and (L H) lag to form (LL, LH, HL, HH)
  LISAType_qdrt <- as.matrix(interaction(LH, LHlag, drop = TRUE))
  LISAType_signif <- LISAType_qdrt
  LISAType_signif[lisa[, 5] > 0.05] <- "n.s"
  LISAType <- as.factor(LISAType_signif)
  
  # Append LISAType to lisa and to (new) spatial dataframe
  LISAsdf <- sdf
  LISAsdf$LISAType <- LISAType
  
  # fortify, merge and map
  LISAsdff <- fortify(LISAsdf, region = spid)
  LISAsdffwithdata <- merge(LISAsdff, LISAsdf@data, by.x = "id", by.y = spid)
  ggplot() + xlab(NULL) + ylab(NULL) + coord_equal() + geom_polygon(data = LISAsdffwithdata, 
                                                                    color = "grey", aes(x = long, y = lat, group = group, fill = LISAType)) + 
    scale_fill_manual(values = c(H.H = "tomato3", L.L = "dodgerblue4", H.L = "goldenrod1", 
                                 L.H = "olivedrab", n.s = "white"))
}

maplisa(Walsdf3, Walsdf3$resid.Model2, "INSCODE", nb2listw_k4)

moran.test(Walsdf3$Unemployment_6months, listw = nb2listw_k10)

## 
##  Moran I test under randomisation
## 
## data:  Walsdf3$Unemployment_6months  
## weights: nb2listw_k10  
## 
## Moran I statistic standard deviate = 18.277, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      0.4706461456     -0.0038461538      0.0006739845

moran.plot(Walsdf3$Unemployment_6months, listw = nb2listw_k10)

lisa <- localmoran(Walsdf3$Unemployment_6months, nb2listw_k10)
maplisa(Walsdf3, Walsdf3$Unemployment_6months, "INSCODE", nb2listw_k10)

Positive Moran’s I showing that places having different outflows as their neighbors and also we conclude there is no residual spatial autocorrelation across the 3 spatial weights used. The Moran I statistic indicates that for a 1 point increase we observe a 0.29 unit increase in contiguous communes. The presence of spatial autocorrelation confirms that residuals and the dependent variable in a specific commune affect the neighboring.

The LISA map for outflows identifies several communes within high-high and low-low clusters. Thus, we observe significant local spatial autocorrelation of flows within and around these communes.

Following spatial dependence test.

###Using log of explanatory variables to ease results interpretation:
Walsdf3$LogRoad_traffic_total. <- log(Walsdf3$Road_traffic_total.)
Walsdf3$LogUnemployment_6months <- log(Walsdf3$Unemployment_6months)
Walsdf3$LogIncome_2013 <- log(Walsdf3$Income_2013)
Walsdf3$LogRegional_Networks <- log(Walsdf3$Regional_Networks)
Walsdf3$LogPopulation_2015 <- log(Walsdf3$Population_2015)
Walsdf3$LogNon_belges2015 <- log(Walsdf3$Non_belges2015)
Walsdf3$LogDensity <- log(Walsdf3$Density)
Walsdf3$LogOutflow2014 <- log(Walsdf3$Outflow2014)
#Estimating the model by ordinary least squares regression
OLS <- lm(LogOutflow2014 ~ LogRoad_traffic_total. + LogUnemployment_6months + LogIncome_2013 + LogRegional_Networks + LogPopulation_2015 + LogNon_belges2015 + LogDensity, data = Walsdf3)
summary(OLS)

## 
## Call:
## lm(formula = LogOutflow2014 ~ LogRoad_traffic_total. + LogUnemployment_6months + 
##     LogIncome_2013 + LogRegional_Networks + LogPopulation_2015 + 
##     LogNon_belges2015 + LogDensity, data = Walsdf3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.82553 -0.13508  0.03121  0.16642  0.66495 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              8.26582    1.36577   6.052 5.13e-09 ***
## LogRoad_traffic_total.   0.08637    0.03464   2.493 0.013296 *  
## LogUnemployment_6months -0.42435    0.16962  -2.502 0.012988 *  
## LogIncome_2013          -0.67311    0.14192  -4.743 3.52e-06 ***
## LogRegional_Networks    -0.14992    0.04014  -3.735 0.000232 ***
## LogPopulation_2015       0.84311    0.05145  16.387  < 2e-16 ***
## LogNon_belges2015       -0.36231    0.03214 -11.272  < 2e-16 ***
## LogDensity               0.09031    0.04018   2.248 0.025458 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2457 on 253 degrees of freedom
## Multiple R-squared:  0.9027, Adjusted R-squared:    0.9 
## F-statistic: 335.1 on 7 and 253 DF,  p-value: < 2.2e-16

bptest<-bptest(OLS)
print(bptest)

## 
##  studentized Breusch-Pagan test
## 
## data:  OLS
## BP = 22.999, df = 7, p-value = 0.001705

#Significant non constant variance of residuals!
vif(OLS)

##  LogRoad_traffic_total. LogUnemployment_6months          LogIncome_2013 
##                4.247261                1.844100                1.608315 
##    LogRegional_Networks      LogPopulation_2015       LogNon_belges2015 
##                4.608943                7.687332                1.780018 
##              LogDensity 
##                8.027522

#Values below 10 suggest that multicollinearity is not an issue
##Spatial dependence test 
#Add the residuals to the spatial dataframe
Walsdf3$residOLS<-resid(OLS)
#Plot residuals
PLOTresidOLS<-spplot(
  Walsdf3, zcol="residOLS",
  col.regions=colorRampPalette(c("#EEE8AA","#3A5FCD"))
  (262))
print(PLOTresidOLS)

##Spatial Dependence Models
LMOLS <- lm.LMtests(OLS,nb2listw_contig,test=c("LMerr", "RLMerr","LMlag","RLMlag", "SARMA"))
print(LMOLS)

## 
##  Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = LogOutflow2014 ~ LogRoad_traffic_total. +
## LogUnemployment_6months + LogIncome_2013 + LogRegional_Networks +
## LogPopulation_2015 + LogNon_belges2015 + LogDensity, data =
## Walsdf3)
## weights: nb2listw_contig
## 
## LMerr = 77.659, df = 1, p-value < 2.2e-16
## 
## 
##  Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = LogOutflow2014 ~ LogRoad_traffic_total. +
## LogUnemployment_6months + LogIncome_2013 + LogRegional_Networks +
## LogPopulation_2015 + LogNon_belges2015 + LogDensity, data =
## Walsdf3)
## weights: nb2listw_contig
## 
## RLMerr = 25.609, df = 1, p-value = 4.18e-07
## 
## 
##  Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = LogOutflow2014 ~ LogRoad_traffic_total. +
## LogUnemployment_6months + LogIncome_2013 + LogRegional_Networks +
## LogPopulation_2015 + LogNon_belges2015 + LogDensity, data =
## Walsdf3)
## weights: nb2listw_contig
## 
## LMlag = 104.71, df = 1, p-value < 2.2e-16
## 
## 
##  Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = LogOutflow2014 ~ LogRoad_traffic_total. +
## LogUnemployment_6months + LogIncome_2013 + LogRegional_Networks +
## LogPopulation_2015 + LogNon_belges2015 + LogDensity, data =
## Walsdf3)
## weights: nb2listw_contig
## 
## RLMlag = 52.662, df = 1, p-value = 3.962e-13
## 
## 
##  Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = LogOutflow2014 ~ LogRoad_traffic_total. +
## LogUnemployment_6months + LogIncome_2013 + LogRegional_Networks +
## LogPopulation_2015 + LogNon_belges2015 + LogDensity, data =
## Walsdf3)
## weights: nb2listw_contig
## 
## SARMA = 130.32, df = 2, p-value < 2.2e-16

##Spatial Error Model 
SEM <- errorsarlm(LogOutflow2014 ~ LogRoad_traffic_total. + LogUnemployment_6months + LogIncome_2013 + LogRegional_Networks + LogPopulation_2015 + LogNon_belges2015 + LogDensity,
                     data = Walsdf3, nb2listw_contig, zero.policy = TRUE)
summary(SEM)

## 
## Call:errorsarlm(formula = LogOutflow2014 ~ LogRoad_traffic_total. + 
##     LogUnemployment_6months + LogIncome_2013 + LogRegional_Networks + 
##     LogPopulation_2015 + LogNon_belges2015 + LogDensity, data = Walsdf3, 
##     listw = nb2listw_contig, zero.policy = TRUE)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.691836 -0.084091  0.016527  0.113180  0.379762 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##                          Estimate Std. Error z value  Pr(>|z|)
## (Intercept)              3.486056   1.727885  2.0175  0.043640
## LogRoad_traffic_total.   0.065339   0.024333  2.6852  0.007248
## LogUnemployment_6months -0.062126   0.132355 -0.4694  0.638790
## LogIncome_2013          -0.271239   0.163785 -1.6561  0.097707
## LogRegional_Networks    -0.120968   0.028849 -4.1931 2.751e-05
## LogPopulation_2015       0.852105   0.035679 23.8822 < 2.2e-16
## LogNon_belges2015       -0.200218   0.035996 -5.5623 2.662e-08
## LogDensity              -0.104168   0.037579 -2.7720  0.005571
## 
## Lambda: 0.86875, LR test value: 134.74, p-value: < 2.22e-16
## Asymptotic standard error: 0.030798
##     z-value: 28.208, p-value: < 2.22e-16
## Wald statistic: 795.71, p-value: < 2.22e-16
## 
## Log likelihood: 67.46134 for error model
## ML residual variance (sigma squared): 0.027702, (sigma: 0.16644)
## Number of observations: 261 
## Number of parameters estimated: 10 
## AIC: -114.92, (AIC for lm: 17.822)

##Spatial autoregressive model
SAR<-lagsarlm(LogOutflow2014 ~ LogRoad_traffic_total. + LogUnemployment_6months + LogIncome_2013 + LogRegional_Networks + LogPopulation_2015 + LogNon_belges2015 + LogDensity,
  data=Walsdf3, listw= nb2listw_contig)
summary(SAR)

## 
## Call:lagsarlm(formula = LogOutflow2014 ~ LogRoad_traffic_total. + 
##     LogUnemployment_6months + LogIncome_2013 + LogRegional_Networks + 
##     LogPopulation_2015 + LogNon_belges2015 + LogDensity, data = Walsdf3, 
##     listw = nb2listw_contig)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.713873 -0.116775  0.014278  0.127357  0.613707 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##                          Estimate Std. Error  z value  Pr(>|z|)
## (Intercept)              3.710714   1.089305   3.4065  0.000658
## LogRoad_traffic_total.   0.037441   0.026911   1.3913  0.164139
## LogUnemployment_6months -0.213895   0.131283  -1.6293  0.103257
## LogIncome_2013          -0.510333   0.109518  -4.6598 3.165e-06
## LogRegional_Networks    -0.077893   0.031490  -2.4736  0.013376
## LogPopulation_2015       0.825892   0.039808  20.7471 < 2.2e-16
## LogNon_belges2015       -0.269618   0.025398 -10.6158 < 2.2e-16
## LogDensity              -0.034998   0.031575  -1.1084  0.267685
## 
## Rho: 0.37782, LR test value: 121.04, p-value: < 2.22e-16
## Asymptotic standard error: 0.027728
##     z-value: 13.626, p-value: < 2.22e-16
## Wald statistic: 185.67, p-value: < 2.22e-16
## 
## Log likelihood: 60.60695 for lag model
## ML residual variance (sigma squared): 0.035694, (sigma: 0.18893)
## Number of observations: 261 
## Number of parameters estimated: 10 
## AIC: -101.21, (AIC for lm: 17.822)
## LM test for residual autocorrelation
## test value: 44.718, p-value: 2.2757e-11

##Spatial Simultaneous Autoregressive Model
SAC<- sacsarlm(
  LogOutflow2014 ~ LogRoad_traffic_total. + LogUnemployment_6months + LogIncome_2013 + LogRegional_Networks + LogPopulation_2015 + LogNon_belges2015 + LogDensity,
  data=Walsdf3, nb2listw_contig,listw2=NULL)
summary(SAC)

## 
## Call:sacsarlm(formula = LogOutflow2014 ~ LogRoad_traffic_total. + 
##     LogUnemployment_6months + LogIncome_2013 + LogRegional_Networks + 
##     LogPopulation_2015 + LogNon_belges2015 + LogDensity, data = Walsdf3, 
##     listw = nb2listw_contig, listw2 = NULL)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.653691 -0.092300  0.022833  0.116948  0.421300 
## 
## Type: sac 
## Coefficients: (asymptotic standard errors) 
##                          Estimate Std. Error z value  Pr(>|z|)
## (Intercept)              2.395892   1.391691  1.7216 0.0851476
## LogRoad_traffic_total.   0.050298   0.024808  2.0275 0.0426130
## LogUnemployment_6months -0.143359   0.130552 -1.0981 0.2721592
## LogIncome_2013          -0.399983   0.134330 -2.9776 0.0029051
## LogRegional_Networks    -0.110230   0.029392 -3.7503 0.0001766
## LogPopulation_2015       0.904872   0.036723 24.6406 < 2.2e-16
## LogNon_belges2015       -0.228005   0.031211 -7.3053 2.767e-13
## LogDensity              -0.111134   0.034504 -3.2209 0.0012778
## 
## Rho: 0.3201
## Asymptotic standard error: 0.039364
##     z-value: 8.1318, p-value: 4.4409e-16
## Lambda: 0.58761
## Asymptotic standard error: 0.069994
##     z-value: 8.3952, p-value: < 2.22e-16
## 
## LR test value: 164.82, p-value: < 2.22e-16
## 
## Log likelihood: 82.4989 for sac model
## ML residual variance (sigma squared): 0.028061, (sigma: 0.16752)
## Number of observations: 261 
## Number of parameters estimated: 11 
## AIC: -143, (AIC for lm: 17.822)

Results of testing multicollinearity and heteroskedasticity test:

Numbers of outflows are:

Increased road traffic has a positive impact.
Unemployment rate decrease the number of outflows.
Higher income decrease number of commuting outflows.
Developed regional road network decrease the number of outflows.
The bigger population of the commune increasing number of outflows.
The bigger number of non-belges decrease number of outflows.
Density of population increase the number of outflows.
90% of outflows are explained by the considered (and significant) variables.

Testing for multicollinearity issues shows that our model dont have multicollinearity issues becouse our values are below 10, but one variable is very close to this number - Density with value ~8.

LM test results shows that our model that we don’t have spatial error dependence with in OLS model.

The spatially autoregressive model shows a significant rho value indicating spatial dependence of the dependent variable. The results of the coefficients indicate a different outcome compared to the original OLS results. Now variable Density have negative impact on outflows. Most significant variables, which have positive impact on outflows, are population number and road traffic. Level of income and number of foreigners are decreasing outflows from commune.

##GWR 
library(lmtest)
library(car)
library(MASS)
library(rgeos)
library(RColorBrewer)
library(maptools)
library(classInt)
library(GWmodel)

## Warning: package 'GWmodel' was built under R version 3.3.3

## Loading required package: robustbase

## Warning: package 'robustbase' was built under R version 3.3.3

## 
## Attaching package: 'robustbase'

## The following object is masked from 'package:survival':
## 
##     heart

## Loading required package: Rcpp

## Welcome to GWmodel version 2.0-4.
##  Note: This verision has been re-built with RcppArmadillo to improve its performance.

Walsdf2 <- Walsdf
##transforming variables to logarithm 
Walsdf2$LogRoad_traffic_total. <- log(Walsdf2$Road_traffic_total.)
Walsdf2$LogUnemployment_6months <- log(Walsdf2$Unemployment_6months)
Walsdf2$LogIncome_2013 <- log(Walsdf2$Income_2013)
Walsdf2$LogRegional_Networks <- log(Walsdf2$Regional_Networks)
Walsdf2$LogPopulation_2015 <- log(Walsdf2$Population_2015)
Walsdf2$LogNon_belges2015 <- log(Walsdf2$Non_belges2015)
Walsdf2$LogDensity <- log(Walsdf2$Density)
Walsdf2$LogOutflow2014 <- log(Walsdf2$Outflow2014)
##creating dependent variable and ...
DepVar<- Walsdf2$LogOutflow2014
##data frame of independent
expl_vars<- data.frame(Unemployment_6months = Walsdf2$LogUnemployment_6months, Income_2013 = Walsdf2$LogIncome_2013, Regional_Networks = Walsdf2$LogRegional_Networks, Population_2015 = Walsdf2$LogPopulation_2015, Non_belges2015 = Walsdf2$LogNon_belges2015, Density = Walsdf2$LogDensity, Road_traffic_total. = Walsdf2$LogRoad_traffic_total.)
IndVars<- names(expl_vars)
cls<-cor(expl_vars, method = "pearson")
cls

##                      Unemployment_6months Income_2013 Regional_Networks
## Unemployment_6months           1.00000000  0.45219373         0.2275134
## Income_2013                    0.45219373  1.00000000        -0.1628057
## Regional_Networks              0.22751337 -0.16280567         1.0000000
## Population_2015               -0.33106232 -0.20477789         0.2442602
## Non_belges2015                -0.23799909 -0.25133937        -0.1181432
## Density                       -0.48473178 -0.10562502        -0.3283707
## Road_traffic_total.           -0.01536594 -0.03431514         0.5810853
##                      Population_2015 Non_belges2015    Density
## Unemployment_6months      -0.3310623     -0.2379991 -0.4847318
## Income_2013               -0.2047779     -0.2513394 -0.1056250
## Regional_Networks          0.2442602     -0.1181432 -0.3283707
## Population_2015            1.0000000      0.5454102  0.7597231
## Non_belges2015             0.5454102      1.0000000  0.6057748
## Density                    0.7597231      0.6057748  1.0000000
## Road_traffic_total.        0.7446633      0.2850621  0.3788065
##                      Road_traffic_total.
## Unemployment_6months         -0.01536594
## Income_2013                  -0.03431514
## Regional_Networks             0.58108534
## Population_2015               0.74466326
## Non_belges2015                0.28506212
## Density                       0.37880652
## Road_traffic_total.           1.00000000

DepVar<-"LogOutflow2014"

model.sel<-model.selection.gwr(DepVar,IndVars,data=Walsdf2,kernel="bisquare", adaptive = TRUE, bw=20)

## Now calbrating the model: 
##  LogOutflow2014~Unemployment_6months 
## Now calbrating the model: 
##  LogOutflow2014~Income_2013 
## Now calbrating the model: 
##  LogOutflow2014~Regional_Networks 
## Now calbrating the model: 
##  LogOutflow2014~Population_2015 
## Now calbrating the model: 
##  LogOutflow2014~Non_belges2015 
## Now calbrating the model: 
##  LogOutflow2014~Density 
## Now calbrating the model: 
##  LogOutflow2014~Road_traffic_total. 
## Now calbrating the model: 
##  LogOutflow2014~Population_2015+Unemployment_6months 
## Now calbrating the model: 
##  LogOutflow2014~Population_2015+Income_2013 
## Now calbrating the model: 
##  LogOutflow2014~Population_2015+Regional_Networks 
## Now calbrating the model: 
##  LogOutflow2014~Population_2015+Non_belges2015 
## Now calbrating the model: 
##  LogOutflow2014~Population_2015+Density 
## Now calbrating the model: 
##  LogOutflow2014~Population_2015+Road_traffic_total. 
## Now calbrating the model: 
##  LogOutflow2014~Population_2015+Non_belges2015+Unemployment_6months 
## Now calbrating the model: 
##  LogOutflow2014~Population_2015+Non_belges2015+Income_2013 
## Now calbrating the model: 
##  LogOutflow2014~Population_2015+Non_belges2015+Regional_Networks 
## Now calbrating the model: 
##  LogOutflow2014~Population_2015+Non_belges2015+Density 
## Now calbrating the model: 
##  LogOutflow2014~Population_2015+Non_belges2015+Road_traffic_total. 
## Now calbrating the model: 
##  LogOutflow2014~Population_2015+Non_belges2015+Regional_Networks+Unemployment_6months 
## Now calbrating the model: 
##  LogOutflow2014~Population_2015+Non_belges2015+Regional_Networks+Income_2013 
## Now calbrating the model: 
##  LogOutflow2014~Population_2015+Non_belges2015+Regional_Networks+Density 
## Now calbrating the model: 
##  LogOutflow2014~Population_2015+Non_belges2015+Regional_Networks+Road_traffic_total. 
## Now calbrating the model: 
##  LogOutflow2014~Population_2015+Non_belges2015+Regional_Networks+Density+Unemployment_6months 
## Now calbrating the model: 
##  LogOutflow2014~Population_2015+Non_belges2015+Regional_Networks+Density+Income_2013 
## Now calbrating the model: 
##  LogOutflow2014~Population_2015+Non_belges2015+Regional_Networks+Density+Road_traffic_total. 
## Now calbrating the model: 
##  LogOutflow2014~Population_2015+Non_belges2015+Regional_Networks+Density+Income_2013+Unemployment_6months 
## Now calbrating the model: 
##  LogOutflow2014~Population_2015+Non_belges2015+Regional_Networks+Density+Income_2013+Road_traffic_total. 
## Now calbrating the model: 
##  LogOutflow2014~Population_2015+Non_belges2015+Regional_Networks+Density+Income_2013+Road_traffic_total.+Unemployment_6months

sorted.models <- model.sort.gwr(model.sel, numVars = length(IndVars),
                                ruler.vector = model.sel[[2]][,2])
model.list <- sorted.models[[1]]
X11(width = 12, height = 6)
model.view.gwr(DepVar, IndVars, model.list = model.list)

X11(width = 12, height = 6)
plot(sorted.models[[2]][,2], col = "black", pch = 20, lty = 5,
     main = "Alternative view of model selection procedure",
     ylab = "AICc", xlab = "Model number", type = "b")

##The above figures suggest that we should select model 28 because this has the lowest AICc. We ##first select the optimal bandwidth and then fit the basic local model:
bw.a <- bw.gwr(LogOutflow2014 ~ LogRoad_traffic_total. + LogUnemployment_6months + LogIncome_2013 + LogRegional_Networks + LogPopulation_2015 + LogNon_belges2015 + LogDensity, data = Walsdf2, approach = "AICc", kernel = "bisquare", adaptive = TRUE)

## Adaptive bandwidth (number of nearest neighbours): 169 AICc value: -23.97292 
## Adaptive bandwidth (number of nearest neighbours): 112 AICc value: -49.37453 
## Adaptive bandwidth (number of nearest neighbours): 76 AICc value: -70.38208 
## Adaptive bandwidth (number of nearest neighbours): 54 AICc value: -65.6204 
## Adaptive bandwidth (number of nearest neighbours): 89 AICc value: -65.3176 
## Adaptive bandwidth (number of nearest neighbours): 67 AICc value: -73.06261 
## Adaptive bandwidth (number of nearest neighbours): 62 AICc value: -69.89982 
## Adaptive bandwidth (number of nearest neighbours): 70 AICc value: -73.45803 
## Adaptive bandwidth (number of nearest neighbours): 72 AICc value: -72.15264 
## Adaptive bandwidth (number of nearest neighbours): 68 AICc value: -73.54795 
## Adaptive bandwidth (number of nearest neighbours): 68 AICc value: -73.54795

bw.a

## [1] 68

gwr.res <- gwr.basic(LogOutflow2014 ~ LogRoad_traffic_total. + LogUnemployment_6months + LogIncome_2013 + LogRegional_Networks + LogPopulation_2015 + LogNon_belges2015 + LogDensity,data=Walsdf2, bw=bw.a, kernel = "bisquare",
                     adaptive = TRUE, F123.test = TRUE)
print(gwr.res)

##    ***********************************************************************
##    *                       Package   GWmodel                             *
##    ***********************************************************************
##    Program starts at: 2017-07-03 02:18:01 
##    Call:
##    gwr.basic(formula = LogOutflow2014 ~ LogRoad_traffic_total. + 
##     LogUnemployment_6months + LogIncome_2013 + LogRegional_Networks + 
##     LogPopulation_2015 + LogNon_belges2015 + LogDensity, data = Walsdf2, 
##     bw = bw.a, kernel = "bisquare", adaptive = TRUE, F123.test = TRUE)
## 
##    Dependent (y) variable:  LogOutflow2014
##    Independent variables:  LogRoad_traffic_total. LogUnemployment_6months LogIncome_2013 LogRegional_Networks LogPopulation_2015 LogNon_belges2015 LogDensity
##    Number of data points: 262
##    ***********************************************************************
##    *                    Results of Global Regression                     *
##    ***********************************************************************
## 
##    Call:
##     lm(formula = formula, data = data)
## 
##    Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.03452 -0.13488  0.03305  0.16206  0.68267 
## 
##    Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
##    (Intercept)              8.18560    1.41214   5.797 2.00e-08 ***
##    LogRoad_traffic_total.   0.09497    0.03576   2.656 0.008409 ** 
##    LogUnemployment_6months -0.47327    0.17500  -2.704 0.007305 ** 
##    LogIncome_2013          -0.64611    0.14661  -4.407 1.55e-05 ***
##    LogRegional_Networks    -0.15957    0.04144  -3.851 0.000149 ***
##    LogPopulation_2015       0.84339    0.05320  15.853  < 2e-16 ***
##    LogNon_belges2015       -0.38282    0.03287 -11.646  < 2e-16 ***
##    LogDensity               0.08817    0.04154   2.122 0.034770 *  
## 
##    ---Significance stars
##    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
##    Residual standard error: 0.254 on 254 degrees of freedom
##    Multiple R-squared: 0.8962
##    Adjusted R-squared: 0.8933 
##    F-statistic: 313.2 on 7 and 254 DF,  p-value: < 2.2e-16 
##    ***Extra Diagnostic information
##    Residual sum of squares: 16.39299
##    Sigma(hat): 0.2510975
##    AIC:  35.39319
##    AICc:  36.10747
##    ***********************************************************************
##    *          Results of Geographically Weighted Regression              *
##    ***********************************************************************
## 
##    *********************Model calibration information*********************
##    Kernel function: bisquare 
##    Adaptive bandwidth: 68 (number of nearest neighbours)
##    Regression points: the same locations as observations are used.
##    Distance metric: Euclidean distance metric is used.
## 
##    ****************Summary of GWR coefficient estimates:******************
##                                  Min.    1st Qu.     Median    3rd Qu.
##    Intercept               -11.970000  -1.888000   1.880000   9.276000
##    LogRoad_traffic_total.   -0.066070  -0.002775   0.038450   0.075510
##    LogUnemployment_6months  -1.304000  -0.357300  -0.110000   0.086350
##    LogIncome_2013           -1.780000  -0.773100  -0.156100   0.140000
##    LogRegional_Networks     -0.619100  -0.114100  -0.065750  -0.044540
##    LogPopulation_2015        0.539800   0.799000   0.853000   0.902900
##    LogNon_belges2015        -0.582900  -0.360500  -0.222700  -0.122800
##    LogDensity               -0.368100  -0.048970   0.010050   0.104200
##                               Max.
##    Intercept               19.4200
##    LogRoad_traffic_total.   0.5688
##    LogUnemployment_6months  0.8106
##    LogIncome_2013           1.2750
##    LogRegional_Networks     0.0730
##    LogPopulation_2015       1.0580
##    LogNon_belges2015        0.1152
##    LogDensity               0.2485
##    ************************Diagnostic information*************************
##    Number of data points: 262 
##    Effective number of parameters (2trace(S) - trace(S'S)): 83.38452 
##    Effective degrees of freedom (n-2trace(S) + trace(S'S)): 178.6155 
##    AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): -73.54795 
##    AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): -186.6502 
##    Residual sum of squares: 5.864239 
##    R-square value:  0.9628579 
##    Adjusted R-square value:  0.945421 
##    ******************F test results of GWR calibration********************
##    ---F1 test (Leung et al. 2000)
##     F1 statistic Numerator DF Denominator DF     Pr(>)    
##          0.50871    201.68382            254 3.752e-07 ***
##    ---F2 test (Leung et al. 2000)
##     F2 statistic Numerator DF Denominator DF     Pr(>)    
##           2.1641     103.4093            254 4.644e-07 ***
##    ---F3 test (Leung et al. 2000)
##                            F3 statistic Numerator DF Denominator DF
##    Intercept                    5.87267     87.54336         201.68
##    LogRoad_traffic_total.       1.90820     93.88837         201.68
##    LogUnemployment_6months      0.84144     88.63470         201.68
##    LogIncome_2013               4.54567     97.07097         201.68
##    LogRegional_Networks         1.29183     86.64788         201.68
##    LogPopulation_2015           0.85715     88.82904         201.68
##    LogNon_belges2015            3.42688     90.95265         201.68
##    LogDensity                   1.97262    125.95909         201.68
##                                Pr(>)    
##    Intercept               < 2.2e-16 ***
##    LogRoad_traffic_total.  7.538e-05 ***
##    LogUnemployment_6months    0.8216    
##    LogIncome_2013          < 2.2e-16 ***
##    LogRegional_Networks       0.0730 .  
##    LogPopulation_2015         0.7945    
##    LogNon_belges2015       2.052e-13 ***
##    LogDensity              8.245e-06 ***
##    ---F4 test (GWR book p92)
##     F4 statistic Numerator DF Denominator DF     Pr(>)    
##          0.35773    178.61548            254 7.358e-13 ***
## 
##    ---Significance stars
##    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
##    ***********************************************************************
##    Program stops at: 2017-07-03 02:18:11

##GWR map Unemployment 6 months
mypalette.4 <- brewer.pal(4, "RdPu")
X11(width=10,height=12)
spplot (gwr.res$SDF, "LogUnemployment_6months", key.space = "right", col.regions = mypalette.4, at = c(-1.304000,  -0.357300,  -0.110000,   0.086350,  0.8106),main = "GW regression coefficient estimates for Unemployment during 6months")

##GWR map Road Traffic 
mypalette.4 <- brewer.pal(4, "Greys")
X11(width=10,height=12)
spplot (gwr.res$SDF, "LogRoad_traffic_total.", key.space = "right", col.regions = mypalette.4, at = c(-0.066070,  -0.002775,   0.038450,   0.075510,  0.5688),main = "GW regression coefficient estimates for Total Road traffic")

##GWR map Income 2013
mypalette.4 <- brewer.pal(4, "Blues")
X11(width=10,height=12)
spplot (gwr.res$SDF, "LogIncome_2013", key.space = "right", col.regions = mypalette.4, at = c(-1.780000,  -0.773100,  -0.156100,   0.140000,  1.2750),main = "GW regression coefficient estimates for Income 2013")

##GWR map Regional Networks 
mypalette.4 <- brewer.pal(4, "Blues")
X11(width=10,height=12)
spplot (gwr.res$SDF, "LogRegional_Networks", key.space = "right", col.regions = mypalette.4, at = c(-0.619100,  -0.114100,  -0.065750,  -0.044540,  0.0730),main = "GW regression coefficient estimates for Regional road Networks")

##GWR map Non Belgium population 
mypalette.4 <- brewer.pal(4, "Blues")
X11(width=10,height=12)
spplot (gwr.res$SDF, "LogNon_belges2015", key.space = "right", col.regions = mypalette.4, at = c(-0.582900, -0.360500, -0.222700, -0.122800, 0.1152),main = "GW regression coefficient estimates for Non Belgium population")

##checking is whether there significant local multi-collinearity among the independent variables
diagnostics<-gwr.collin.diagno(LogOutflow2014 ~ LogRoad_traffic_total. + LogUnemployment_6months + LogIncome_2013 + LogRegional_Networks + LogPopulation_2015 + LogNon_belges2015 + LogDensity,data=Walsdf2,bw=bw.a,
                               kernel = "bisquare", adaptive = TRUE)
summary(diagnostics$SDF)

## Object of class SpatialPolygonsDataFrame
## Coordinates:
##        min      max
## x 42262.92 295156.8
## y 21153.22 167685.8
## Is projected: TRUE 
## proj4string :
## [+init=epsg:31370 +proj=lcc +lat_1=51.16666723333333
## +lat_2=49.8333339 +lat_0=90 +lon_0=4.367486666666666
## +x_0=150000.013 +y_0=5400088.438 +ellps=intl
## +towgs84=-106.8686,52.2978,-103.7239,0.3366,-0.457,1.8422,-1.2747
## +units=m +no_defs]
## Data attributes:
##  LogRoad_traffic_total._VIF LogUnemployment_6months_VIF LogIncome_2013_VIF
##  Min.   : 2.878             Min.   :1.355               Min.   :1.241     
##  1st Qu.: 3.759             1st Qu.:1.654               1st Qu.:1.960     
##  Median : 4.486             Median :1.846               Median :2.444     
##  Mean   : 5.380             Mean   :1.880               Mean   :2.579     
##  3rd Qu.: 6.522             3rd Qu.:2.054               3rd Qu.:3.217     
##  Max.   :10.612             Max.   :2.820               Max.   :4.328     
##                                                                           
##  LogRegional_Networks_VIF LogPopulation_2015_VIF LogNon_belges2015_VIF
##  Min.   : 3.160           Min.   : 4.304         Min.   :1.206        
##  1st Qu.: 4.277           1st Qu.: 6.557         1st Qu.:2.190        
##  Median : 5.016           Median : 7.865         Median :2.639        
##  Mean   : 5.631           Mean   : 9.546         Mean   :3.370        
##  3rd Qu.: 6.982           3rd Qu.:11.139         3rd Qu.:4.230        
##  Max.   :10.165           Max.   :23.509         Max.   :9.197        
##                                                                       
##  LogDensity_VIF      local_CN      Intercept_VDP   
##  Min.   : 3.272   Min.   : 333.7   Min.   :0.5189  
##  1st Qu.: 6.028   1st Qu.: 508.5   1st Qu.:0.9170  
##  Median : 8.093   Median : 606.9   Median :0.9519  
##  Mean   : 9.004   Mean   : 617.2   Mean   :0.9320  
##  3rd Qu.:10.021   3rd Qu.: 684.1   3rd Qu.:0.9762  
##  Max.   :24.958   Max.   :1265.6   Max.   :0.9946  
##                                                    
##  LogRoad_traffic_total._VDP LogUnemployment_6months_VDP LogIncome_2013_VDP
##  Min.   :0.0000007          Min.   :0.0000001           Min.   :0.8812    
##  1st Qu.:0.0333841          1st Qu.:0.0157851           1st Qu.:0.9596    
##  Median :0.1035997          Median :0.0583175           Median :0.9702    
##  Mean   :0.1232443          Mean   :0.0911896           Mean   :0.9659    
##  3rd Qu.:0.1943633          3rd Qu.:0.1278506           3rd Qu.:0.9790    
##  Max.   :0.4838005          Max.   :0.4419177           Max.   :0.9963    
##                                                                           
##  LogRegional_Networks_VDP LogPopulation_2015_VDP LogNon_belges2015_VDP
##  Min.   :0.0000022        Min.   :0.0000036      Min.   :0.0000007    
##  1st Qu.:0.0153198        1st Qu.:0.0057180      1st Qu.:0.0127011    
##  Median :0.0722376        Median :0.0282982      Median :0.0767879    
##  Mean   :0.1186868        Mean   :0.0418708      Mean   :0.1322904    
##  3rd Qu.:0.1851158        3rd Qu.:0.0621585      3rd Qu.:0.2084201    
##  Max.   :0.4361819        Max.   :0.2059242      Max.   :0.5378926    
##                                                                       
##  LogDensity_VDP      Corr_Intercept.LogRoad_traffic_total.
##  Min.   :0.0000009   Min.   :0                            
##  1st Qu.:0.0106005   1st Qu.:0                            
##  Median :0.0537197   Median :0                            
##  Mean   :0.0945768   Mean   :0                            
##  3rd Qu.:0.1511137   3rd Qu.:0                            
##  Max.   :0.4170393   Max.   :0                            
##                      NA's   :261                          
##  Corr_Intercept.LogUnemployment_6months Corr_Intercept.LogIncome_2013
##  Min.   :0                              Min.   :0                    
##  1st Qu.:0                              1st Qu.:0                    
##  Median :0                              Median :0                    
##  Mean   :0                              Mean   :0                    
##  3rd Qu.:0                              3rd Qu.:0                    
##  Max.   :0                              Max.   :0                    
##  NA's   :261                            NA's   :261                  
##  Corr_Intercept.LogRegional_Networks Corr_Intercept.LogPopulation_2015
##  Min.   :0                           Min.   :0                        
##  1st Qu.:0                           1st Qu.:0                        
##  Median :0                           Median :0                        
##  Mean   :0                           Mean   :0                        
##  3rd Qu.:0                           3rd Qu.:0                        
##  Max.   :0                           Max.   :0                        
##  NA's   :261                         NA's   :261                      
##  Corr_Intercept.LogNon_belges2015 Corr_Intercept.LogDensity
##  Min.   :0                        Min.   :0                
##  1st Qu.:0                        1st Qu.:0                
##  Median :0                        Median :0                
##  Mean   :0                        Mean   :0                
##  3rd Qu.:0                        3rd Qu.:0                
##  Max.   :0                        Max.   :0                
##  NA's   :261                      NA's   :261              
##  Corr_LogRoad_traffic_total..LogUnemployment_6months
##  Min.   :-0.30717                                   
##  1st Qu.:-0.13571                                   
##  Median :-0.03547                                   
##  Mean   :-0.02026                                   
##  3rd Qu.: 0.10855                                   
##  Max.   : 0.31723                                   
##                                                     
##  Corr_LogRoad_traffic_total..LogIncome_2013
##  Min.   :-0.50696                          
##  1st Qu.:-0.24156                          
##  Median :-0.10948                          
##  Mean   :-0.09771                          
##  3rd Qu.: 0.01777                          
##  Max.   : 0.35821                          
##                                            
##  Corr_LogRoad_traffic_total..LogRegional_Networks
##  Min.   :0.4768                                  
##  1st Qu.:0.6549                                  
##  Median :0.7086                                  
##  Mean   :0.7035                                  
##  3rd Qu.:0.7647                                  
##  Max.   :0.8714                                  
##                                                  
##  Corr_LogRoad_traffic_total..LogPopulation_2015
##  Min.   :0.6714                                
##  1st Qu.:0.7266                                
##  Median :0.7614                                
##  Mean   :0.7626                                
##  3rd Qu.:0.7951                                
##  Max.   :0.8528                                
##                                                
##  Corr_LogRoad_traffic_total..LogNon_belges2015
##  Min.   :-0.0815                              
##  1st Qu.: 0.1711                              
##  Median : 0.3232                              
##  Mean   : 0.2889                              
##  3rd Qu.: 0.3950                              
##  Max.   : 0.5956                              
##                                               
##  Corr_LogRoad_traffic_total..LogDensity
##  Min.   :0.1558                        
##  1st Qu.:0.3442                        
##  Median :0.4015                        
##  Mean   :0.4027                        
##  3rd Qu.:0.4761                        
##  Max.   :0.5877                        
##                                        
##  Corr_LogUnemployment_6months.LogIncome_2013
##  Min.   :0.05445                            
##  1st Qu.:0.47293                            
##  Median :0.57274                            
##  Mean   :0.54594                            
##  3rd Qu.:0.67321                            
##  Max.   :0.77519                            
##                                             
##  Corr_LogUnemployment_6months.LogRegional_Networks
##  Min.   :-0.19078                                 
##  1st Qu.:-0.05521                                 
##  Median : 0.01547                                 
##  Mean   : 0.05497                                 
##  3rd Qu.: 0.14634                                 
##  Max.   : 0.43803                                 
##                                                   
##  Corr_LogUnemployment_6months.LogPopulation_2015
##  Min.   :-0.57350                               
##  1st Qu.:-0.39692                               
##  Median :-0.26624                               
##  Mean   :-0.26075                               
##  3rd Qu.:-0.14713                               
##  Max.   : 0.05209                               
##                                                 
##  Corr_LogUnemployment_6months.LogNon_belges2015
##  Min.   :-0.6330                               
##  1st Qu.:-0.4962                               
##  Median :-0.3794                               
##  Mean   :-0.3033                               
##  3rd Qu.:-0.1534                               
##  Max.   : 0.2519                               
##                                                
##  Corr_LogUnemployment_6months.LogDensity
##  Min.   :-0.66827                       
##  1st Qu.:-0.57812                       
##  Median :-0.45158                       
##  Mean   :-0.39509                       
##  3rd Qu.:-0.25683                       
##  Max.   : 0.09945                       
##                                         
##  Corr_LogIncome_2013.LogRegional_Networks
##  Min.   :-0.53816                        
##  1st Qu.:-0.28905                        
##  Median :-0.17496                        
##  Mean   :-0.14788                        
##  3rd Qu.:-0.03109                        
##  Max.   : 0.33124                        
##                                          
##  Corr_LogIncome_2013.LogPopulation_2015
##  Min.   :-0.7429                       
##  1st Qu.:-0.4615                       
##  Median :-0.3416                       
##  Mean   :-0.2966                       
##  3rd Qu.:-0.1168                       
##  Max.   : 0.2296                       
##                                        
##  Corr_LogIncome_2013.LogNon_belges2015 Corr_LogIncome_2013.LogDensity
##  Min.   :-0.8445                       Min.   :-0.802641             
##  1st Qu.:-0.6141                       1st Qu.:-0.631377             
##  Median :-0.4482                       Median :-0.438625             
##  Mean   :-0.3874                       Mean   :-0.309246             
##  3rd Qu.:-0.2603                       3rd Qu.:-0.002343             
##  Max.   : 0.3943                       Max.   : 0.465242             
##                                                                      
##  Corr_LogRegional_Networks.LogPopulation_2015
##  Min.   :0.01776                             
##  1st Qu.:0.38077                             
##  Median :0.51520                             
##  Mean   :0.48649                             
##  3rd Qu.:0.62557                             
##  Max.   :0.76748                             
##                                              
##  Corr_LogRegional_Networks.LogNon_belges2015
##  Min.   :-0.43120                           
##  1st Qu.:-0.09785                           
##  Median : 0.03302                           
##  Mean   : 0.01789                           
##  3rd Qu.: 0.14557                           
##  Max.   : 0.33645                           
##                                             
##  Corr_LogRegional_Networks.LogDensity
##  Min.   :-0.55563                    
##  1st Qu.:-0.19105                    
##  Median :-0.01973                    
##  Mean   :-0.05192                    
##  3rd Qu.: 0.11125                    
##  Max.   : 0.34954                    
##                                      
##  Corr_LogPopulation_2015.LogNon_belges2015
##  Min.   :0.1283                           
##  1st Qu.:0.3934                           
##  Median :0.5498                           
##  Mean   :0.5208                           
##  3rd Qu.:0.6233                           
##  Max.   :0.8364                           
##                                           
##  Corr_LogPopulation_2015.LogDensity Corr_LogNon_belges2015.LogDensity
##  Min.   :0.6264                     Min.   :0.05735                  
##  1st Qu.:0.6912                     1st Qu.:0.61258                  
##  Median :0.7257                     Median :0.72908                  
##  Mean   :0.7414                     Mean   :0.66142                  
##  3rd Qu.:0.7772                     3rd Qu.:0.82014                  
##  Max.   :0.8846                     Max.   :0.88637                  
##

The results of the geographically weighted regression in a bandwidth of 68 nearest neighbors and F3 test showing us that not all independent variables are equal in explaining variance across space. The logs of unemployment rate and population indicate stationarity, which means that the two variables have a consistent effect on explaining outflows across space.

To understand the results of the GWR, we plot each variable to determine its impact on flows:

Log of the Road traffic is an indicator of outflows in the north of Wallonia near the Tournai, Namur and Malmedy in the east. As road traffic increases, outflows also increases.
Log of the Unemployment rate inverse indicator of outflows in the middle of the Wallonia going from Charleroi in the west to Liege in the east. As unemployment rate increase, outflows decrease.
Log of Income strong indicator of outflows in the middle of Wallonia which goes from Chimay to Spa and also decrease the outflows with increasing.
Log of Regional Networks is inverse indicator more characteristic for Mons, Liege and south of Wallonia.
Log of non-Belgium people inverse indicator of outflows more spread near Mons and Liege.

Multi-collinearity diagnostics results

The rules of thump suggest that local VIFs should be below 10 and local correlation coefficients below 0,8 or above -0,8 in case of negative correlations for the results to be defensible. The following GWR coefficients display significant multi-collinearity between road traffic, regional networks and population what is actually logic, and high coefficients between income, non-Belgium people and density and population, what is saying to us that it will be better to exclude some kind of duplicating indicators such as density and population, road traffic and regional networks.

Conclusion

In this project, I analyzed and studied flows and their causes using various methods of statistics and econometrics. I tried to explain outflows from different communes by unemployment rate, road traffic, size of regional road networks, income, by foreign population, density and number of population. During my analysis of flows, I faced to some limitation, due to freedom of movement in Europe Union, part of flows from Wallonia region did not take into account, but they make huge impact of the region. Maybe to make more accurate analyze of flows it will be better to include in the study area such important cities as Lille in France, Aachen in Germany, borders communes of Luxembourg and also capital of Belgium - Brussels. It was possible to see from the map of income that people with higher income are living in communes on border with Brussels capital region and near the border with Luxemburg, according to this we can suppose that people are living in Wallonia region but going to work in these regions. Difference in level of wages and number of job places make these cities attractive for commuting flows.

During the study process, we determined five main cities on north in Wallonia region. They are Tournai, Mons, Charleroi, Namur and Liege. Most numbers of flows in Wallonia are going through this cities, we can only suppose why, in my opinion, because of:

the wide range of activities which are taking place in these cities,
developed industry and services sector provide wide range of jobs,
main universities and schools are situated there,
lots of people and goods are moving through this cities when they are going from France to Germany and vise-versa,
because developed road infrastructure allows to do this, etc.

I think developed road, railway infrastructure is directly interconnected with flows in Wallonia, if we will look to the map of roads in Wallonia and compere it the map of flows, which was produced during this project it is possible to claim that flows are connected to the main motorways E42 and E19. It is obvious that number of flows are less in the south of Wallonia region, because of luck of good infrastructure and urbanism, but it is depend on what kind of result we expect, south of Wallonia region is covered by forests and fields make this place attractive for housing and family life. But in case of commuting flows to the work through the developing regional road networks possible to reach reducing the time of the journey to work and by this attract more people from periphery of the region, make people work in the urban centers and earn higher wages, reduce the unemployment rate and keep green communes green without useless urbanization of them. In its turn people will bring high wages to their communes from urbans develop and support them in way more suitable for calm and fresh life. Results of our regression model shows that regional road network is significant inversed variable for outflows, increase in one unit of it, outflows decreasing, is possible to suppose that it will have invested result for inflows, so the more developed road network you have more people are going in your commune, but not out.

Another less strong variable related to the roads, what also was shown by GWR analysis is traffic jams. It have positive sign and high level of significance for increasing outflows. Usually traffic jams appears during the rush ours in the cities with higher number of population GWR multi-collinearity test also showed this relationship. It is possible to suppose obvious thing that more traffic jams you have, more inflows and outflows you have. Maybe next time it will better to use railways network or number of cars per family to avoid the multi-collinearity between this variable and regional road networks to explain outflows.

Significant variable in explaining outflows is Population variable. It has positive sign each increase in 1 unit of it increase the outflows from commune. As in case with traffic jams GWR multi collinearity test shows that this variable have positive correlation with Density and Foreigners, and it is obvious, in future test it will be better to use one of them density or population, in my opinion, it is better to use population number and replace density by average age of people, keeping in head, that the younger the people the more they are commuting.

Let discuss Density of population. What is interesting with this variable that during all the test it had positive sign and each increase by 1 unit increase outflows, but when we start to do Lagrange multiplier diagnostic tests for spatial dependence it changed the sign to negative but during the GWR sign changed again to positive but this variable start to be less significant. I think it is good idea to replace it from the model by another one.

Another discover for me during working on this project war behavior of significant variable Unemployment rate during the 6 months. In the beginning I was expecting that the higher rate of unemployment in commune it will lead to more increasing number of outflows. On practice, all the tests showed that with the increasing this variable by 1 unit decrease number of outflows. This kind of behavior of this variable I didn’t expected and for me it is still difficult to explain. Maybe by the calibrating the model and testing with dependent variables inflows and outflows will shed light on it. Now I am thinking that 6 months is too short term and according to developed system of government help to unemployment people have more time for searching suitable job for them without financial hardship.

Strong significant variable is number of foreigners and non-belges, all the test showed high significance of this variable, multi-collinearity diagnostics showed positive multi-collinearity between population, density, income and number of non-belges. Each increase of this variable by one unit leading decrease number of outflows, what was expected in the beginning because as usual they are coming to other places due to economic factor and higher number of different jobs in this places, usually developed cities with high number of people.

Last significant variable with expected negative sign is Income. Each increase of this variable leads to decreasing outflows from the commune. It is possible to explain that higher income attract inflows to the commune than outflows. People commuting to the communes with higher income. It is also can be explained by the Luxembourgish cross border relationships, when thousands of people are coming from border communes to work in Luxembourg because of the higher minimum wage, but leaving in another countries. In my opinion, it will have opposed sign if we will put it in the model for inflows as a dependent variable.

As a conclusion, it is possible to assume that even all the diagnostics and test showed very high R-squared value 0.8-9 we cannot be sure that this model explains 90% of outflows in Wallonia region, but vice versa, gives to use sign that something is missing or wrong with this model and it need to be calibrated. Multi-collinearity diagnostics showed that some variables such as density and population, road network and traffic jams in practice are duplicating each other and need to be replaced by another. Still Population, Road networks, Non-belges people have very significant values in explaining outflows in Wallonia region. It is possible to add to the model data about longer unemployment rate to see in which commune real problem with employment and analyze it impact on outflows. Good idea to add average number of cars per family or total number of cars in commune, add data about accessibility to railway road, add spatial data about big cities in other countries which have border with Wallonia region to analyze outflows to other countries, along with this data about residential area also can shed light on flows. Data about total work places inside each commune can also explain part of flows in Wallonia region. As a result, in my opinion improving of the road infrastructure in the south of Wallonia region will give fruitful results and increase proximity to the northern main cities in Wallonia, thus will bring new opportunities for the people.

Commuting Flows in Wallonia

Boris Murzakaev

30 june 2017

Introduction

Data Collection

Methodology

Conclusion