Updated 23rd April 2018
Interaction (or flow) data tell us about who moves where - key examples are migration (how many people have moved between one area and another over one year?) and commuting (how many people live in a given area but work within another specific area). This practical makes use of employment and commuting data provided as outputs from the Census in 2011 and it introduces methods for the visualisation of commuting flows between small areas within Liverpool. It also considers how we can consider the determinants of commuting - for example, what draws people from residential areas to areas of employment? Obvious factors might include distance (flows are likely to be larger between places which are close together) and the availability of jobs in destination locations (that is, areas of employment). Using a method called Poisson regression, you will model the relationship between flows and origin (area of residence) and destination (area of employment). Your role here is one of a transport planner seeking to consider how well existing transport infrastructure caters for workers in the Liverpool region.
The exercise is based on RStudio - the package used for part of the previous practical exercise.
The exercise entails the use of two kinds of data:
Flow data represent movements of people, information, resources, etc between places. For some background to the kinds of flow data we will be working with, see:
You will write a short executive report (max 500 words) that will answer the following questions:
In addition to answering these questions in your report, you should present as a minimum the following information:
Overall, your report should not have more than 6 maps or charts. Maps can be generated within R or QGIS.
The submission deadline is specified in the module handbook. Only electronic submission via VITAL is required.
We are going to look at commuting across small areas within Liverpool. The areas we will use are called middle layer super output areas (MSOAs). In 2011, there were 7201 MSOAs in England and Wales with a mean average population of 7787. The flow data can be downloaded from the CIDER (Centre for Interaction Data Estimation and Research) website.
You will also download data for MSOAs (i.e., not information on flows, but on the MSOAs themselves). These data will be downloaded from the Infuse website.
Click Add. You should then have seven category combinations.
Click Next.
In the Geography window expand Local Authorities (click on -). Expand Liverpool and select Middle Super Output Areas and Intermediate Zones (61 areas).
Then, click Add followed by Next.
Click Get the data, followed by Download data.
Save the data to your data directory (e.g., M:/ENVS257/Commuting).When viewed in Excel, the file should have information in columns A to L, with 63 rows.
Open R. As usual, you must set your working directory.
# For example:
setwd("M:/ENVS257/Commuting/")
Read the file in for use later on (making sure your filename is correctly specified within ‘read.csv’):
#Read csv file, skipping the first two rows:
lpoolemploy <- read.csv("Data_AGE_ECOACT_UNIT.csv", header = FALSE, skip = 2)
Now rename the columns:
colnames(lpoolemploy) <- c("ID","geocode","label","type","typeid","employeePT","employeeFT","selfempwithPT","selfempwithFT","selfempnoPT","selfempnoFT","unemploy")
You need to compute the percentage of people who are unemployed with the sum of employed and unemployed persons as the denominator. The unemployment percentages will be mapped and analysed later.
First, compute the total number of employed people (you will need this later as well as at this stage):
lpoolemploy$employed <- lpoolemploy$employeePT+lpoolemploy$employeeFT+lpoolemploy$selfempwithPT+lpoolemploy$selfempwithFT+lpoolemploy$selfempnoPT+lpoolemploy$selfempnoFT
Then the total employed and unemployed people:
lpoolemploy$total <- lpoolemploy$employed+lpoolemploy$unemploy
Next, compute the percentage unemployed:
lpoolemploy$unemployPC <- (lpoolemploy$unemploy/lpoolemploy$total)*100
Now check the file:
head(lpoolemploy)
## ID geocode label
## 1 11246 E02001347 Liverpool 001
## 2 11247 E02001348 Liverpool 002
## 3 11248 E02001349 Liverpool 003
## 4 11249 E02001350 Liverpool 004
## 5 11250 E02001351 Liverpool 005
## 6 11251 E02001352 Liverpool 006
## type typeid employeePT
## 1 Middle Super Output Areas and Intermediate Zones MSOAIZ 829
## 2 Middle Super Output Areas and Intermediate Zones MSOAIZ 798
## 3 Middle Super Output Areas and Intermediate Zones MSOAIZ 870
## 4 Middle Super Output Areas and Intermediate Zones MSOAIZ 780
## 5 Middle Super Output Areas and Intermediate Zones MSOAIZ 931
## 6 Middle Super Output Areas and Intermediate Zones MSOAIZ 821
## employeeFT selfempwithPT selfempwithFT selfempnoPT selfempnoFT unemploy
## 1 2086 13 64 78 187 352
## 2 2253 14 37 81 183 326
## 3 2621 6 57 77 228 351
## 4 1883 13 53 60 183 415
## 5 1966 12 50 53 157 533
## 6 1750 1 49 72 165 475
## NA employed total unemployPC
## 1 NA 3257 3609 9.753394
## 2 NA 3366 3692 8.829902
## 3 NA 3859 4210 8.337292
## 4 NA 2972 3387 12.252731
## 5 NA 3169 3702 14.397623
## 6 NA 2858 3333 14.251425
You can copy the commands you enter into the R console and then re-run/modify these commands ar a later point. To do this go to File > New File > R script. Then go to File > Save As and give the new file a sensible name such as script. In that case, the file is saved as script.R
Now, as you work through the exercise, you can copy your commands into the script file and then save the script file before ending the session. To run code, you can select code in the script file and then click the Run button.
To map the data you have already worked with you need boundary files (vector polygon data). Area data can be downloaded from the website http://edina.ac.uk/ukborders/, but the area data you need are provided on VITAL > Learning Resources > Practical > LiverpoolMSOA.zip.
Download the file, unzip the downloaded data, and place the contents inside your working directory.
If you open the csv file in a text editor such as Wordpad or Notepad, you’ll see there are four header lines. We don’t want to keep any of these. The csv file can be read in as follows (note that ttw = travel to work).
#Read csv file, skipping the first four rows:
lpoolttw <- read.csv("LiverpoolTTW.csv", header = FALSE, skip = 4)
Next, preview the file by looking at the first five rows.
head(lpoolttw)
## V1 V2 V3
## 1 E02001370 E02001347 15
## 2 E02001404 E02001347 7
## 3 E02001360 E02001347 37
## 4 E02001348 E02001347 161
## 5 E02001365 E02001347 37
## 6 E02001376 E02001347 19
We will rename these to something more meaningful:
colnames(lpoolttw) <- c("Origin","Destination","FlowOD")
Check that this has worked by again looking at the first five rows of the data.
head(lpoolttw)
## Origin Destination FlowOD
## 1 E02001370 E02001347 15
## 2 E02001404 E02001347 7
## 3 E02001360 E02001347 37
## 4 E02001348 E02001347 161
## 5 E02001365 E02001347 37
## 6 E02001376 E02001347 19
Next, we will summarise the variables.
summary(lpoolttw)
## Origin Destination FlowOD
## E02001347: 61 E02001347: 61 Min. : 1.00
## E02001353: 61 E02001350: 61 1st Qu.: 6.00
## E02001354: 61 E02001351: 61 Median : 13.00
## E02001359: 61 E02001354: 61 Mean : 32.57
## E02001362: 61 E02001355: 61 3rd Qu.: 30.00
## E02001363: 61 E02001358: 61 Max. :722.00
## (Other) :3272 (Other) :3272 NA's :2
The two NA’s indicate rows at the end of the data which were textual information included in the downloaded file. You don’t need to worry about these for now.
Note that the miniumum, 1st quartile and median FlowOD values are 1, 6, and 13 respectively. In other words, most flows are small.
If you want to save your workspace, so you can open it up later on, you can use the save.image function.
# This will save all objects currently loaded in the workspace
save.image(file = "ttw.RData")
After saving, you can re-load the objects contained in this saved file at any time (e.g. after closing R) using the load command. The .RData file is stored in the working directory defined earlier.
# Remember that you will need to set your working directory belocatorfore trying
# to load - otherwise R will not know where to find your RData file
load("ttw.RData")
As the next step, we need to import the Liverpool MSOAs shapefile (vector polygon boundaries) downloaded earlier.
We will need to make use of functions which are not contained in the base R installation.
You need to load ‘maptools’ and ‘rgeos’. These packages include functions for reading and manipulating GIS data.
# Set working directory
setwd("M:/ENVS257/Commuting/")
#Choose the CRAN mirror where the packages are going to be downloaded from
options("repos" = c(CRAN = "http://www.stats.bris.ac.uk/R/"))
#Download / install and then load sp
install.packages("sp", depend = TRUE, lib = getwd())
library("sp", lib.loc = getwd())
#Load maptools
library("maptools")
#Load stringr
library("stringr")
#Load rgeos
library("rgeos")
Now we will read in the MSOAs data into a type of object called a SpatialPolygonsDataFrame.
lpoolMSOA <- readShapePoly("LiverpoolMSOA.shp")
## Warning: use rgdal::readOGR or sf::st_read
We can see what the SpatialPolygonsDataFrame looks like using a the plot() function.
# A basic plot of the MSOAs SpatialPolygonsDataFrame
plot(lpoolMSOA)
Now we have a map of MSOAs in Liverpool we can consider mapping commuting flows between areas.
First, we need to extract several elements from the LiverpoolMSOA shapefile attribute table. We do this using SpatialPointsDataFrame. To map flows between zones we also need the centroids of the MSOAs, and this is achieved using the coordinates command.
MSOAXY <- SpatialPointsDataFrame(coordinates(lpoolMSOA),data=as(lpoolMSOA, "data.frame")[c("geo_code")])
MSOAXY is then joined to the lpoolttw and to the lpoolemploy data. We need to join these three files twice - once linking geo_code (in MSOAXY) to Destination in lpoolttw to create Distttw1 and then geocode (in lpoolemploy) to Destination (in the new file Distttw1) to creat Distttw2. Then the same steps are repeated for Origin to create Distttw3 and Distttw4.
So, first we’ll join to Destination and then the output from this to Origin:
Distttw1 <- merge(lpoolttw,MSOAXY,by.x="Destination",by.y="geo_code")
Distttw2 <- merge(lpoolemploy,Distttw1,by.x="geocode",by.y="Destination",all=TRUE)
head(Distttw2)
## geocode ID label
## 1 E02001347 11246 Liverpool 001
## 2 E02001347 11246 Liverpool 001
## 3 E02001347 11246 Liverpool 001
## 4 E02001347 11246 Liverpool 001
## 5 E02001347 11246 Liverpool 001
## 6 E02001347 11246 Liverpool 001
## type typeid employeePT
## 1 Middle Super Output Areas and Intermediate Zones MSOAIZ 829
## 2 Middle Super Output Areas and Intermediate Zones MSOAIZ 829
## 3 Middle Super Output Areas and Intermediate Zones MSOAIZ 829
## 4 Middle Super Output Areas and Intermediate Zones MSOAIZ 829
## 5 Middle Super Output Areas and Intermediate Zones MSOAIZ 829
## 6 Middle Super Output Areas and Intermediate Zones MSOAIZ 829
## employeeFT selfempwithPT selfempwithFT selfempnoPT selfempnoFT unemploy
## 1 2086 13 64 78 187 352
## 2 2086 13 64 78 187 352
## 3 2086 13 64 78 187 352
## 4 2086 13 64 78 187 352
## 5 2086 13 64 78 187 352
## 6 2086 13 64 78 187 352
## NA employed total unemployPC Origin FlowOD coords.x1 coords.x2
## 1 NA 3257 3609 9.753394 E02001370 15 338686.2 396978
## 2 NA 3257 3609 9.753394 E02001348 161 338686.2 396978
## 3 NA 3257 3609 9.753394 E02001365 37 338686.2 396978
## 4 NA 3257 3609 9.753394 E02001404 7 338686.2 396978
## 5 NA 3257 3609 9.753394 E02001360 37 338686.2 396978
## 6 NA 3257 3609 9.753394 E02001371 24 338686.2 396978
And now the equivalent for Origin (Ignore the warning message which appears after creating Distttw4):
Distttw3 <- merge(Distttw2,MSOAXY,by.x="Origin",by.y="geo_code")
Distttw4 <- merge(lpoolemploy,Distttw3,by.x="geocode",by.y="Origin",all=TRUE)
## Warning in merge.data.frame(lpoolemploy, Distttw3, by.x = "geocode", by.y =
## "Origin", : column name 'geocode' is duplicated in the result
head(Distttw4)
## geocode ID.x label.x
## 1 E02001347 11246 Liverpool 001
## 2 E02001347 11246 Liverpool 001
## 3 E02001347 11246 Liverpool 001
## 4 E02001347 11246 Liverpool 001
## 5 E02001347 11246 Liverpool 001
## 6 E02001347 11246 Liverpool 001
## type.x typeid.x employeePT.x
## 1 Middle Super Output Areas and Intermediate Zones MSOAIZ 829
## 2 Middle Super Output Areas and Intermediate Zones MSOAIZ 829
## 3 Middle Super Output Areas and Intermediate Zones MSOAIZ 829
## 4 Middle Super Output Areas and Intermediate Zones MSOAIZ 829
## 5 Middle Super Output Areas and Intermediate Zones MSOAIZ 829
## 6 Middle Super Output Areas and Intermediate Zones MSOAIZ 829
## employeeFT.x selfempwithPT.x selfempwithFT.x selfempnoPT.x selfempnoFT.x
## 1 2086 13 64 78 187
## 2 2086 13 64 78 187
## 3 2086 13 64 78 187
## 4 2086 13 64 78 187
## 5 2086 13 64 78 187
## 6 2086 13 64 78 187
## unemploy.x NA.x employed.x total.x unemployPC.x geocode ID.y
## 1 352 NA 3257 3609 9.753394 E02001350 11249
## 2 352 NA 3257 3609 9.753394 E02001401 11299
## 3 352 NA 3257 3609 9.753394 E02001366 11265
## 4 352 NA 3257 3609 9.753394 E02001376 11275
## 5 352 NA 3257 3609 9.753394 E02001382 11280
## 6 352 NA 3257 3609 9.753394 E02001361 11260
## label.y type.y typeid.y
## 1 Liverpool 004 Middle Super Output Areas and Intermediate Zones MSOAIZ
## 2 Liverpool 055 Middle Super Output Areas and Intermediate Zones MSOAIZ
## 3 Liverpool 020 Middle Super Output Areas and Intermediate Zones MSOAIZ
## 4 Liverpool 030 Middle Super Output Areas and Intermediate Zones MSOAIZ
## 5 Liverpool 036 Middle Super Output Areas and Intermediate Zones MSOAIZ
## 6 Liverpool 015 Middle Super Output Areas and Intermediate Zones MSOAIZ
## employeePT.y employeeFT.y selfempwithPT.y selfempwithFT.y selfempnoPT.y
## 1 780 1883 13 53 60
## 2 781 2275 17 105 96
## 3 923 2355 17 64 90
## 4 803 1572 18 62 75
## 5 835 2215 16 92 76
## 6 848 1863 10 45 70
## selfempnoFT.y unemploy.y NA.y employed.y total.y unemployPC.y FlowOD
## 1 183 415 NA 2972 3387 12.252731 50
## 2 186 232 NA 3460 3692 6.283857 13
## 3 236 582 NA 3685 4267 13.639559 11
## 4 118 635 NA 2648 3283 19.342065 9
## 5 234 200 NA 3468 3668 5.452563 3
## 6 170 412 NA 3006 3418 12.053833 4
## coords.x1.x coords.x2.x coords.x1.y coords.x2.y
## 1 340083.4 396003.7 338686.2 396978
## 2 342699.6 385411.4 338686.2 396978
## 3 338261.9 392714.7 338686.2 396978
## 4 336835.5 391054.2 338686.2 396978
## 5 341247.0 389565.8 338686.2 396978
## 6 337033.7 393676.8 338686.2 396978
Next, replace the column labels:
colnames(Distttw4) <- c("Origin","OriginID","Originlabel","Origintype","OrigintypeID","OemployeePT","OemployeeFT","OselfempwithPT","OselfempwithFT","OselfempnoPT","OselfempnoFT","Ounemploy","NA","Oemployed","Ototal","OunemployPC","Destination","DestinationID","Destinationlabel","Destinationtype","DestinationtypeID","DemployeePT","DemployeeFT","DselfempwithPT","DselfempwithFT","DselfempnoPT","DselfempnoFT","Dunemploy","NA","Demployed","Dtotal","DunemployPC","FlowOD","Deast","Dnorth","Oeast","Onorth")
head(Distttw4)
## Origin OriginID Originlabel
## 1 E02001347 11246 Liverpool 001
## 2 E02001347 11246 Liverpool 001
## 3 E02001347 11246 Liverpool 001
## 4 E02001347 11246 Liverpool 001
## 5 E02001347 11246 Liverpool 001
## 6 E02001347 11246 Liverpool 001
## Origintype OrigintypeID
## 1 Middle Super Output Areas and Intermediate Zones MSOAIZ
## 2 Middle Super Output Areas and Intermediate Zones MSOAIZ
## 3 Middle Super Output Areas and Intermediate Zones MSOAIZ
## 4 Middle Super Output Areas and Intermediate Zones MSOAIZ
## 5 Middle Super Output Areas and Intermediate Zones MSOAIZ
## 6 Middle Super Output Areas and Intermediate Zones MSOAIZ
## OemployeePT OemployeeFT OselfempwithPT OselfempwithFT OselfempnoPT
## 1 829 2086 13 64 78
## 2 829 2086 13 64 78
## 3 829 2086 13 64 78
## 4 829 2086 13 64 78
## 5 829 2086 13 64 78
## 6 829 2086 13 64 78
## OselfempnoFT Ounemploy NA Oemployed Ototal OunemployPC Destination
## 1 187 352 NA 3257 3609 9.753394 E02001350
## 2 187 352 NA 3257 3609 9.753394 E02001401
## 3 187 352 NA 3257 3609 9.753394 E02001366
## 4 187 352 NA 3257 3609 9.753394 E02001376
## 5 187 352 NA 3257 3609 9.753394 E02001382
## 6 187 352 NA 3257 3609 9.753394 E02001361
## DestinationID Destinationlabel
## 1 11249 Liverpool 004
## 2 11299 Liverpool 055
## 3 11265 Liverpool 020
## 4 11275 Liverpool 030
## 5 11280 Liverpool 036
## 6 11260 Liverpool 015
## Destinationtype DestinationtypeID
## 1 Middle Super Output Areas and Intermediate Zones MSOAIZ
## 2 Middle Super Output Areas and Intermediate Zones MSOAIZ
## 3 Middle Super Output Areas and Intermediate Zones MSOAIZ
## 4 Middle Super Output Areas and Intermediate Zones MSOAIZ
## 5 Middle Super Output Areas and Intermediate Zones MSOAIZ
## 6 Middle Super Output Areas and Intermediate Zones MSOAIZ
## DemployeePT DemployeeFT DselfempwithPT DselfempwithFT DselfempnoPT
## 1 780 1883 13 53 60
## 2 781 2275 17 105 96
## 3 923 2355 17 64 90
## 4 803 1572 18 62 75
## 5 835 2215 16 92 76
## 6 848 1863 10 45 70
## DselfempnoFT Dunemploy NA Demployed Dtotal DunemployPC FlowOD Deast
## 1 183 415 NA 2972 3387 12.252731 50 340083.4
## 2 186 232 NA 3460 3692 6.283857 13 342699.6
## 3 236 582 NA 3685 4267 13.639559 11 338261.9
## 4 118 635 NA 2648 3283 19.342065 9 336835.5
## 5 234 200 NA 3468 3668 5.452563 3 341247.0
## 6 170 412 NA 3006 3418 12.053833 4 337033.7
## Dnorth Oeast Onorth
## 1 396003.7 338686.2 396978
## 2 385411.4 338686.2 396978
## 3 392714.7 338686.2 396978
## 4 391054.2 338686.2 396978
## 5 389565.8 338686.2 396978
## 6 393676.8 338686.2 396978
You will now map flows as lines with thicknesses indicating the size of the flows:
# Plot the MSOAs and then map the flows
plot(lpoolMSOA)
for (i in 1:length(Distttw4[, 1])) {lines(Distttw4 [i, c(36, 34)], Distttw4 [i, c(37, 35)], lwd=0.2 + Distttw4 [i,33]/100, col="blue")}
The map is almost untelligible at present, so we will modify the flows to make the picture clearer. Firstly, we’ll remove the internal flows (people who lived and worked within the same MSOA).
# Compute the distances between the Origins and Destinations
Distttw4$Dist <- sqrt((Distttw4$Oeast-Distttw4$Deast)**2+(Distttw4$Onorth-Distttw4$Dnorth)**2)
# Retain rows where Dist > 0
Distttw0 <- Distttw4[ which(Distttw4$Dist > 0),]
We will further refine the data to exclude all flows of <= 250
# Retain rows where FlowOD > 250
DistttwCut <- Distttw0[ which(Distttw0$FlowOD > 250),]
# Plot the MSOAs and then map the flows
plot(lpoolMSOA)
for (i in 1:length(DistttwCut[, 1])) {lines(DistttwCut [i, c(36, 34)], DistttwCut [i, c(37, 35)], lwd=0.2 + DistttwCut [i,33]/100, col="blue")}
That’s better — it is now possible to discern some major trends in commuting across Liverpool. Try some variants other flow cutoffs (where DistttwCutTest is the new (output) file and Distttw0 is the input).
Next we will map the percentage of unemployed people. Firstly, you need to join the employment data to the MSOA areas so that the employment counts and percentages can be mapped:
lpoolMSOA@data=data.frame(lpoolMSOA@data, lpoolemploy[match(lpoolMSOA@data$geo_code, lpoolemploy$geocode),])
Now we’ll map the percentage of people who were unemployed in 2011 (unemployPC) and add a legend to the map. At this stage, we need two new libraries - RColorBrewer and classInt.
#Download / install and then load RColorBrewer
install.packages("RColorBrewer", depend = TRUE, lib = getwd())
library("RColorBrewer", lib.loc = getwd())
#Download / install and then load RColorBrewer
library("classInt")
We select a class interval scheme to display the values, and add a legend.
fj5 <- classIntervals(lpoolMSOA$unemployPC, n=5, style="fisher")
pal <- grey.colors(4, 0.95, 0.55, 2.2)
fj5Colours <- findColours(fj5, pal)
plot(lpoolMSOA, col=fj5Colours, pch=19)
legend("topleft", fill=attr(fj5Colours, "palette"),legend=names(attr(fj5Colours, "table")), bty="n")
To map rates, it’s useful to be able to round the values for inclusion in the legend. The first line of the code above can be modified to do this; we can also change the “topleft” element under the legend to more precisely position the legend (obviously, changing the x and y coordinates moves the legend); you will also add a scale bar and a north arrow:
fj5 <- classIntervals(round(lpoolMSOA$unemployPC,digits=2), n = 5, style = "fisher")
pal <- grey.colors(4, 0.95, 0.55, 2.2)
fj5Colours <- findColours(fj5, pal)
plot(lpoolMSOA, col=fj5Colours, pch=19)
legend(x = 329500, y = 387500, fill=attr(fj5Colours, "palette"),legend=names(attr(fj5Colours, "table")), bty="n")
# Add a scale bar
SpatialPolygonsRescale(layout.scale.bar(), offset = c(331000, 381500), scale = 4000, fill = c("white", "black"), plot.grid = F)
text(331000, 381000, "0", cex = 0.6)
text(333000, 381000, "2", cex = 0.6)
text(335000, 381000, "4km", cex = 0.6)
# Add a North arrow
SpatialPolygonsRescale(layout.north.arrow(1), offset= c(344000,395000), scale = 2000, plot.grid=F)
The six figure numbers are eastings and northings and these can be changed with the result that the position of the features they refer to change. To determine the coordinates of a location type:
locator()
Click on the plot at the point for which you want locational information. Then click the ESC key. The coordinates of the location will then appear in the R console.
Note that you can export the lpoolMSOA file into shapefile format which can then be opened in QGIS. This is done with:
writeSpatialShape(lpoolMSOA,"lpmsoa")
## Warning: use rgdal::readOGR or sf::st_read
## Warning: use rgdal::readOGR or sf::st_read
This creates the output shapefile ‘lpmsoa’. If you prefer to use QGIS to make maps of MSOA area features you can do so following gudiance provided in the previous practical exercises.
One useful way of exploring the spatial patterning in data values (here, unemployment percentages) is to compute a statistic called the Moran’s I spatial autocorrelation coefficient. Positive values of I indicate clustering of similar values, negative values of I indicate clustering of dissimilar values and values close to zero indicate zero spatial autocorrelation (a ‘random’ spatial pattern).
First, we need to install and load the package ‘spdep’:
#Download / install and then load spdep
install.packages("spdep")
library("spdep")
Next we need to create list indicating which MSOAs share boundaries with which other MSOAs. This is done using the command ‘poly2nb’. This must then be converted into a weights file using ‘nblistw’. The weights file represents the spatial relationship between MSOAs and if an MSOA borders five other MSOAs then each neighbouring MSOA is given a weight of 0.2 since the weights for each set of neighbours of each MSOA sum to one and 1/5 = 0.2. With this weights file we can measure the average difference between unemployment percentages at each MSOA and at the neighbouring MSOAs. A value of I close to one would indicate that the unemployment percentage values at each MSOA tend to be similar to the unemployment percentage values at other MSOAs with which they share a border - thus, the values cluster.
MSOA.nb <- poly2nb(lpoolMSOA)
MSOA.wt <- nb2listw(MSOA.nb)
MSOA.wt
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 61
## Number of nonzero links: 302
## Percentage nonzero weights: 8.116098
## Average number of links: 4.95082
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 61 3721 61 26.82893 251.0616
#Summarise the weights values
summary(unlist(MSOA.wt$weights))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1250 0.1667 0.1667 0.2020 0.2500 1.0000
Using the new weights file we can now compute I:
moransUnemploy <- moran.test(lpoolMSOA$unemployPC, MSOA.wt)
moransUnemploy
##
## Moran I test under randomisation
##
## data: lpoolMSOA$unemployPC
## weights: MSOA.wt
##
## Moran I statistic standard deviate = 5.9407, p-value = 1.419e-09
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.466154721 -0.016666667 0.006605424
The value of 0.466 indicates a moderate level of spatial clustering - that is, neighbouring unemployment percentages tend to be quite similar on average. The p-value of 1.419e-09 indicates that the clustering is highly significant (that is, it is unlikely to have occurred by chance) - values closer to zero indicate a higher significance level.
Values such as 1.419e-09 may not mean much to you. This is scientific notation and is a convenient way of displaying very small or very large values. In the case of 1.419e-09, this means that the decimal place is moved 9 positions to the left, so 1.419e-09 = 0.000000001419.
Next, we are going to explore how the distance between MSOAs and their populations are related to the size of commuting flows. We might expect flows to be larger between MSOAs which are close together. Also flows to MSOAs which have large populations (and, by implications, more housing) may also be large. The relationships between variables such as flow size and distance can be analysed using regression. The regression approach you may be familiar with from Excel, for example, is called ordinary least squares (OLS) regression. OLS regression is used to explore how a set of variables (‘independent variables’, for example, population size of MSOAs and distance between MSOAs) are related to a single variable (for example, flows between MSOAs)
OLS is used here to explore how flows relate to distance, origin populations (number of employed persons), and destination populations.
We run the OLS model with intra-MSOA (internal) flows removed. Research on commuting (and migration) flows suggests that we should transform the variables using logs (that is, they are put on a log scale; zeros can’t be logged, so we add 0.5 to each value):
lreg_ttw <- lm(log(FlowOD+0.5) ~ log(Dist+0.5) + log(Oemployed+0.5) + log(Demployed+0.5), data = Distttw0)
summary(lreg_ttw)
##
## Call:
## lm(formula = log(FlowOD + 0.5) ~ log(Dist + 0.5) + log(Oemployed +
## 0.5) + log(Demployed + 0.5), data = Distttw0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.7790 -0.7238 -0.0659 0.5940 3.2668
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.67385 0.99651 4.690 2.83e-06 ***
## log(Dist + 0.5) -0.86549 0.03134 -27.619 < 2e-16 ***
## log(Oemployed + 0.5) 0.91814 0.08527 10.768 < 2e-16 ***
## log(Demployed + 0.5) -0.25642 0.08485 -3.022 0.00253 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.089 on 3571 degrees of freedom
## Multiple R-squared: 0.1972, Adjusted R-squared: 0.1965
## F-statistic: 292.4 on 3 and 3571 DF, p-value: < 2.2e-16
The estimate values (parameter estimates) for Dist, Oemployed and Oemployed are the key figures here.
The value for Dist is negative, while the value for Oemployed is positive and that for Demployed is negative. This indicates that Dist (distance) has a negative effect - as distance increases, flows decrease. This fits with what we’d expect. The positive value for Oemployed also corresponds to what we’d expect - commuting flows are likely to be larger FROM MSOAs with large populations. The negative value for Demployed seems less intuitive until we note that the employment data we are using are for area of residence and NOT area of workplace. Hence, there may not be a ‘pull factor’ for areas with large numbers of employed residents - instead the ‘pull’ is likely to be towards areas with large numbers of jobs. It is possible to instead assess flows by MSOA of workplace (the area in which people work) rather than (as here) the MSOA of residence (where people live), but will work only with residence data in this exercise.
The astrices (** or ***) at the end of the columns indicate that all of the parameter estimates are statistically significant.
But, there is a problem with using OLS… Flows are counts which are unlikely to have a normal distribution - there isn’t an equal proportion of very small and very large flows, with the bulk in the middle range. Instead, there are many small flows and few large flows. OLS is designed to work with data which have a normal distribution, and so its application to count data like commuting flows is not sensible… An alternative approach is Poisson regression, which is appropriate for flow data. We can apply it using the glm function. Again, we exclude internal flows.
preg_ttw <- glm(FlowOD ~ Dist + Oemployed + Demployed, data = Distttw0, family = poisson())
summary(preg_ttw)
##
## Call:
## glm(formula = FlowOD ~ Dist + Oemployed + Demployed, family = poisson(),
## data = Distttw0)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -10.316 -4.861 -2.780 0.079 49.810
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.262e+00 2.206e-02 193.21 <2e-16 ***
## Dist -1.520e-04 1.269e-06 -119.76 <2e-16 ***
## Oemployed 2.953e-04 5.139e-06 57.46 <2e-16 ***
## Demployed -3.459e-04 4.938e-06 -70.06 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 171205 on 3574 degrees of freedom
## Residual deviance: 147039 on 3571 degrees of freedom
## AIC: 162888
##
## Number of Fisher Scoring iterations: 6
As for OLS, the parameter estimate for Dist is negative, that for Oemployed is positive and again that for Demployed is negative.
There are still problems with applying Poisson regression but it is conceptually an improvement on OLS regression.
We know that data on small flows are unreliable. So, try re-running the Poisson regression model with different (small) flow thresholds (following the steps outlined for mapping flows). Use thresholds of 5 and 10 and compare results to those using all data (that is, a threshold of zero). As a reminder, this can be done with (for the example of a cut-off of 5 people):
DistttwCut <- Distttw0[ which(Distttw0$FlowOD > 5),]
preg_ttw5 <- glm(FlowOD ~ Dist + Oemployed + Demployed, data = DistttwCut, family = poisson())
summary(preg_ttw5)
##
## Call:
## glm(formula = FlowOD ~ Dist + Oemployed + Demployed, family = poisson(),
## data = DistttwCut)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -10.950 -4.953 -2.910 0.152 49.087
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.584e+00 2.216e-02 206.92 <2e-16 ***
## Dist -9.288e-05 1.282e-06 -72.47 <2e-16 ***
## Oemployed 1.970e-04 5.249e-06 37.53 <2e-16 ***
## Demployed -3.718e-04 4.964e-06 -74.89 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 127232 on 2673 degrees of freedom
## Residual deviance: 114571 on 2670 degrees of freedom
## AIC: 127848
##
## Number of Fisher Scoring iterations: 6
If you want to add additional variables (perhaps downloaded from Infuse), you could do this as follows (for a file named ‘lpoolNEW’ with the same format as the employment data and with only two sets of counts):
colnames(lpoolNEW) <- c(“DID”,“Dgeocode”,“Dlabel”,“Dtype”,“Dtypeid”,“Dcount1”,“Dcount2”)
Distttw5 <- merge(lpoolNEW,Distttw0,by.x=“geocode”,by.y=“Destination”,all=TRUE)
Then rename to indicate origins and join to the new Disttw5 file created above:
colnames(lpoolNEW) <- c(“OID”,“Ogeocode”,“Olabel”,“Otype”,“Otypeid”,“Ocount1”,“Ocount2”)
Distttw6 <- merge(lpoolNEW,Distttw5,by.x=“geocode”,by.y=“Origin”,all=TRUE)
Giving the final file Distttw6 (for Dist > 0) containing the flows, distances, employment data and the new counts.
At the end of a session make sure you have daved your data:
save.image(file = "ttw.RData")
When starting a new session, set your working directory and open your data:
setwd("M:/ENVS257/Commuting/")
load("ttw.RData")
You can open your script file (containing your pasted commands) with File > Open and then locate the script (.R) file.
You will also need to reload the libraries - a sensible approach is to copy the library-related code (e.g., install.packages… and library…) into a section of your script file and then run these lines of code after the setwd and load lines detailed above. Then you can re-run any other lines of code (e.g., map creation, statistics, etc).
Note that maps themselves are not saved - the underlying data are and, if you save the code in a script file, then you can simply re-map the existing data by re-running the appropriate code.
The various analysis that you have prepared during this tutorial will be of assistance when compiling your report. You may have to piece together different parts of the code to achieve this, however, examples of any code that you need to complete your report are contained in this document. However, you may wish to make use of the knowledge developed in previous practicals.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.