Exploratory Analysis of the Camden borough in London for selected population variables: Qualification rates
Before modelling dependency, let’s run a simple linear model (OLS). In this case, the percentage of people with qualifications is our dependent variable, and the percentages of unemployed economically active adults and White British ethnicity are our two independent variables
Call:
lm(formula = Qualification ~ Unemployed + White_British, data = OA.Census)
Residuals:
Min 1Q Median 3Q Max
-50.311 -8.014 1.006 8.958 38.046
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 47.86697 2.33574 20.49 <2e-16 ***
Unemployed -3.29459 0.19027 -17.32 <2e-16 ***
White_British 0.41092 0.04032 10.19 <2e-16 ***
---
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12.69 on 746 degrees of freedom
Multiple R-squared: 0.4645, Adjusted R-squared: 0.463
F-statistic: 323.5 on 2 and 746 DF, p-value: < 2.2e-16
This model has an adjusted R-squared value of 0.463. This means we can assume that 46% of the variance can be explained by the model. We can also observe the influences of each of the variables (both statistically significant). However, the overall fit of the model and each of the coefficients may vary across space if we consider different parts of our study area. Therefore, it is worth considering the standardised residuals from the model to help us understand and improve our future models.
Mapping the residuals to see if there is a spatial distribution of them across Camden.
If you notice a geographic pattern to the residuals, it is possible that an unobserved variable may also be influencing our dependent variable in the model (% with a qualification).
Geographically Weighted Regression (GWR) describes a family of regression models in which the coefficients are allowed to vary spatially. In contrast to the OLS model, the Quasi-global R2 of the GWR model is 0.7303206, indicating that this model explains 73% of the variation in the data, compared to 46% of the OLS model.
Adaptive q: 0.381966 CV score: 101420.8
Adaptive q: 0.618034 CV score: 109723.2
Adaptive q: 0.236068 CV score: 96876.06
Adaptive q: 0.145898 CV score: 94192.41
Adaptive q: 0.09016994 CV score: 91099.75
Adaptive q: 0.05572809 CV score: 88242.89
Adaptive q: 0.03444185 CV score: 85633.41
Adaptive q: 0.02128624 CV score: 83790.04
Adaptive q: 0.01315562 CV score: 83096.03
Adaptive q: 0.008130619 CV score: 84177.45
Adaptive q: 0.01535288 CV score: 83014.34
Adaptive q: 0.01515437 CV score: 82957.49
Adaptive q: 0.01436908 CV score: 82857.74
Adaptive q: 0.01440977 CV score: 82852.4
Adaptive q: 0.01457859 CV score: 82833.25
Adaptive q: 0.01479852 CV score: 82855.45
Adaptive q: 0.01461928 CV score: 82829.32
Adaptive q: 0.01468774 CV score: 82823.82
Adaptive q: 0.01473006 CV score: 82835.89
Adaptive q: 0.01468774 CV score: 82823.82
Call:
gwr(formula = Qualification ~ Unemployed + White_British, data = OA.Census,
adapt = GWRbandwidth, hatmatrix = TRUE, se.fit = TRUE)
Kernel function: gwr.Gauss
Adaptive quantile: 0.01468774 (about 11 of 749 data points)
Summary of GWR coefficient estimates at data points:
Min. 1st Qu. Median 3rd Qu. Max.
X.Intercept. 11.08183 34.43427 45.76862 59.75372 85.01866
Unemployed -5.45291 -3.28308 -2.55398 -1.79413 0.77019
White_British -0.28046 0.19955 0.37788 0.53216 0.94678
Global
X.Intercept. 47.8670
Unemployed -3.2946
White_British 0.4109
Number of data points: 749
Effective number of parameters (residual: 2traceS - traceS'S): 132.6449
Effective degrees of freedom (residual: 2traceS - traceS'S): 616.3551
Sigma (residual: 2traceS - traceS'S): 9.903539
Effective number of parameters (model: traceS): 94.44661
Effective degrees of freedom (model: traceS): 654.5534
Sigma (model: traceS): 9.610221
Sigma (ML): 8.983902
AICc (GWR p. 61, eq 2.33; p. 96, eq. 4.21): 5633.438
AIC (GWR p. 96, eq. 4.22): 5508.777
Residual sum of squares: 60452.16
Quasi-global R2: 0.7303206
R-squared value (local R2) across Camden.