ESDA Exploratory Spatial Data Analysis

Exploratory Analysis of the Camden borough in London for selected population variables: Qualification rates

OLS

Column

Before modelling dependency, let’s run a simple linear model (OLS). In this case, the percentage of people with qualifications is our dependent variable, and the percentages of unemployed economically active adults and White British ethnicity are our two independent variables


Call:
lm(formula = Qualification ~ Unemployed + White_British, data = OA.Census)

Residuals:
    Min      1Q  Median      3Q     Max 
-50.311  -8.014   1.006   8.958  38.046 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   47.86697    2.33574   20.49   <2e-16 ***
Unemployed    -3.29459    0.19027  -17.32   <2e-16 ***
White_British  0.41092    0.04032   10.19   <2e-16 ***
---
Signif. codes:  
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 12.69 on 746 degrees of freedom
Multiple R-squared:  0.4645,    Adjusted R-squared:  0.463 
F-statistic: 323.5 on 2 and 746 DF,  p-value: < 2.2e-16

This model has an adjusted R-squared value of 0.463. This means we can assume that 46% of the variance can be explained by the model. We can also observe the influences of each of the variables (both statistically significant). However, the overall fit of the model and each of the coefficients may vary across space if we consider different parts of our study area. Therefore, it is worth considering the standardised residuals from the model to help us understand and improve our future models.

Column

Mapping the residuals to see if there is a spatial distribution of them across Camden.

If you notice a geographic pattern to the residuals, it is possible that an unobserved variable may also be influencing our dependent variable in the model (% with a qualification).

GWR

Geographically Weighted Regression (GWR) describes a family of regression models in which the coefficients are allowed to vary spatially. In contrast to the OLS model, the Quasi-global R2 of the GWR model is 0.7303206, indicating that this model explains 73% of the variation in the data, compared to 46% of the OLS model.

Adaptive q: 0.381966 CV score: 101420.8 
Adaptive q: 0.618034 CV score: 109723.2 
Adaptive q: 0.236068 CV score: 96876.06 
Adaptive q: 0.145898 CV score: 94192.41 
Adaptive q: 0.09016994 CV score: 91099.75 
Adaptive q: 0.05572809 CV score: 88242.89 
Adaptive q: 0.03444185 CV score: 85633.41 
Adaptive q: 0.02128624 CV score: 83790.04 
Adaptive q: 0.01315562 CV score: 83096.03 
Adaptive q: 0.008130619 CV score: 84177.45 
Adaptive q: 0.01535288 CV score: 83014.34 
Adaptive q: 0.01515437 CV score: 82957.49 
Adaptive q: 0.01436908 CV score: 82857.74 
Adaptive q: 0.01440977 CV score: 82852.4 
Adaptive q: 0.01457859 CV score: 82833.25 
Adaptive q: 0.01479852 CV score: 82855.45 
Adaptive q: 0.01461928 CV score: 82829.32 
Adaptive q: 0.01468774 CV score: 82823.82 
Adaptive q: 0.01473006 CV score: 82835.89 
Adaptive q: 0.01468774 CV score: 82823.82

Call:
gwr(formula = Qualification ~ Unemployed + White_British, data = OA.Census, 
    adapt = GWRbandwidth, hatmatrix = TRUE, se.fit = TRUE)
Kernel function: gwr.Gauss 
Adaptive quantile: 0.01468774 (about 11 of 749 data points)
Summary of GWR coefficient estimates at data points:
                  Min.  1st Qu.   Median  3rd Qu.     Max.
X.Intercept.  11.08183 34.43427 45.76862 59.75372 85.01866
Unemployed    -5.45291 -3.28308 -2.55398 -1.79413  0.77019
White_British -0.28046  0.19955  0.37788  0.53216  0.94678
               Global
X.Intercept.  47.8670
Unemployed    -3.2946
White_British  0.4109
Number of data points: 749 
Effective number of parameters (residual: 2traceS - traceS'S): 132.6449 
Effective degrees of freedom (residual: 2traceS - traceS'S): 616.3551 
Sigma (residual: 2traceS - traceS'S): 9.903539 
Effective number of parameters (model: traceS): 94.44661 
Effective degrees of freedom (model: traceS): 654.5534 
Sigma (model: traceS): 9.610221 
Sigma (ML): 8.983902 
AICc (GWR p. 61, eq 2.33; p. 96, eq. 4.21): 5633.438 
AIC (GWR p. 96, eq. 4.22): 5508.777 
Residual sum of squares: 60452.16 
Quasi-global R2: 0.7303206

GWR II

R-squared value (local R2) across Camden.