Suggested citation:

Mendez C. (2020). Geographically weighted regression models: A tutorial using the spgwr package in R. R Studio/RPubs. Available at https://rpubs.com/quarcs-lab/tutorial-gwr1

This work is licensed under the Creative Commons Attribution-Share Alike 4.0 International License.

Acknowledgment:

Material adapted from multiple sources, in particular the course materials of Guy Lansley & James Cheshire (2016).

```
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse) # Modern data science workflow
library(sf)
library(sp)
library(rgdal)
library(rgeos)
library(tmap)
library(tmaptools)
library(spgwr)
library(grid)
library(gridExtra)
# Change the presentation of decimal numbers to 4 and avoid scientific notation
options(prompt="R> ", digits=4, scipen=999)
```

GWR is a nonparametric multivariate model which can indicate

**where non-stationarity may take place**across space (the study area).It can be used to identify

**how (locally weighted) regression coefficients may vary**across space (the study area).

- Explore the residuals of a linear model to understand its limitations
- Run a GWR and observe its parameters across space
- Print multiple maps in one graphic using gridExtra() with tmap

All used datasets can be obtained from Guy Lansley & James Cheshire (2016).

```
## Observations: 749
## Variables: 5
## $ OA <fct> E00004120, E00004121, E00004122, E00004123, E00004124, …
## $ White_British <dbl> 42.36, 47.20, 40.68, 49.66, 51.14, 41.42, 48.54, 48.68,…
## $ Low_Occupancy <dbl> 6.2937, 5.9322, 2.9126, 0.9259, 2.0000, 3.9326, 5.5556,…
## $ Unemployed <dbl> 1.8939, 2.6882, 1.2121, 2.8037, 3.8168, 3.8462, 4.5455,…
## $ Qualification <dbl> 73.63, 69.90, 67.58, 60.78, 65.99, 74.21, 62.45, 60.35,…
```

```
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/Carlos/GitHub/00-QuaRCS/tutorial-gwr1", layer: "Camden_oa11"
## with 749 features
## It has 1 fields
```

Using the `sf`

package

```
## Observations: 749
## Variables: 2
## $ OA11CD <chr> "E00004527", "E00004525", "E00004522", "E00004287", "E000042…
## $ geometry <POLYGON [m]> POLYGON ((530648 181230, 53..., POLYGON ((530511 181…
```

Transform it into a `sf`

object

Let us run a linear model to understand the global relationship between our variables in our study area.

- Dependent variable: The percentage of people with qualifications
- Independent (predictor) variables: the percentages of unemployed economically active adults and White British ethnicity

```
##
## Call:
## lm(formula = Qualification ~ Unemployed + White_British, data = Census.Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -50.31 -8.01 1.01 8.96 38.05
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.8670 2.3357 20.5 <0.0000000000000002 ***
## Unemployed -3.2946 0.1903 -17.3 <0.0000000000000002 ***
## White_British 0.4109 0.0403 10.2 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.7 on 746 degrees of freedom
## Multiple R-squared: 0.464, Adjusted R-squared: 0.463
## F-statistic: 323 on 2 and 746 DF, p-value: <0.0000000000000002
```

This model has an adjusted R-squared value of 0.463. So we can assume 46% of the variance can be explained by the model. We can also observe the influences of each of the variables. However, **the overall fit of the model and each of the coefficients may vary across space** if we consider different parts of our study area. It is, therefore, worth considering the standardized residuals from the model to help us understand and improve our future models.

A residual is the difference between the predicted and observed values for an observation in the model. **Models with lower r-squared values would have greater residuals on average** as the data would not fit the modeled regression line as well. Standardized residuals are represented as Z-scores where 0 represent the predicted values.

If you plot a linear model (i.e. our model object), R will print out four different plots of which are useful for evaluating the model fit. These are very briefly summarized as:

Residuals vs Fitted: considers the relationship between the actual and the predicted data. The more dispersed the residuals are, the weaker the R2 should be. This can be useful to identify outlines too.The fit also tells us if the residuals are non-linearly distributed.

Normal Q-Q: Demonstrates the extent to which the residuals are normally distributed. Normal residuals should fit the straight line well.

Scale-Location: Shows if the residuals are spread equally across the full range of the predictors. If the values in this chart display a linear positive relationship, it suggests that the residuals spread wider and wider for greater values (this is known as

**heteroscedasticity**).Residuals vs Leverage: Identifies outliers, high-leverage points and influential observations. This plot is pretty difficult to interpret and there are other means of identifying these values.

More information on these plots is here