Dynamic Cities

Updated 23rd April 2018

Introduction

Interaction (or flow) data tell us about who moves where - key examples are migration (how many people have moved between one area and another over one year?) and commuting (how many people live in a given area but work within another specific area). This practical makes use of employment and commuting data provided as outputs from the Census in 2011 and it introduces methods for the visualisation of commuting flows between small areas within Liverpool. It also considers how we can consider the determinants of commuting - for example, what draws people from residential areas to areas of employment? Obvious factors might include distance (flows are likely to be larger between places which are close together) and the availability of jobs in destination locations (that is, areas of employment). Using a method called Poisson regression, you will model the relationship between flows and origin (area of residence) and destination (area of employment). Your role here is one of a transport planner seeking to consider how well existing transport infrastructure caters for workers in the Liverpool region.

The exercise is based on RStudio - the package used for part of the previous practical exercise.

The exercise entails the use of two kinds of data:

Area data (numbers of people who in employment or unemployed)
Flow (or interaction) data

Flow data represent movements of people, information, resources, etc between places. For some background to the kinds of flow data we will be working with, see:

http://census.ukdataservice.ac.uk/get-data/flow-data.aspx

Project Tasks

You will write a short executive report (max 500 words) that will answer the following questions:

What are the main commuting patterns in Liverpool?
Where are flows largest? What might this suggest?
How does the geography of employment/unemployment relate to flows?
How important are distance, origin population, destination population (employed people) and other variables (e.g., unemployment) in explaining commuting flows between the areas of Liverpool?

In addition to answering these questions in your report, you should present as a minimum the following information:

A map of percentages of unemployed people.
A map showing commuting flows between MSOAs in Liverpool (with your own flow thresholds).
A table summarising Poisson regression outputs.

Overall, your report should not have more than 6 maps or charts. Maps can be generated within R or QGIS.

The submission deadline is specified in the module handbook. Only electronic submission via VITAL is required.

Downloading flow data

We are going to look at commuting across small areas within Liverpool. The areas we will use are called middle layer super output areas (MSOAs). In 2011, there were 7201 MSOAs in England and Wales with a mean average population of 7787. The flow data can be downloaded from the CIDER (Centre for Interaction Data Estimation and Research) website.

Visit the CIDER website: http://cider.census.ac.uk/
Click on wicid - the flexible query builder. Then click Login using Shibboleth. Login with University of Liverpool. Enter your University of Liverpool username and password. Click Continue.
Click on the Data tab. Click ‘Select by dataset and table’. Select ‘Commuting and journey to education data’, then select ‘2011’.
Select ‘2011 (SWS) MSOA/SOA/Intermediate Zone [Location of usual residence and place of work by sex] - WU01EW - Open’
Select ‘All usual residents aged 16 and over in employment the week before the census’.
Click on the Geography tab. Select ‘Select or edit origins’. Click on ‘List selection’, and click on ‘England and Wales Middle Super Output Areas (MSOA), Northern Ireland Super Output Areas (SOA), Scotland Intermediate Zones 2011’.
Click on the icon below ‘Do you want to select all areas that fall within another larger area?’. Select ‘UK Local Authorities (merged) 2011’. Click ‘Confirm that you wish to proceed using this geography’.
Click ‘Next Page’ several times until you reach Liverpool. Click on the button next to ‘267/Liverpool’. Click ‘Add chosen areas’.
Click ‘select geography destinations’. Click ‘Copy selection’ and then click ‘Set the destinations to be the same as the currently selected origins’.
Click ‘Produce output’, then click ‘Run your query’. Next, click ‘Continue to Output pages’.
Select ‘Tabular output’. Under ‘Origin and destination labels’ select Change. Change the 1st label setting to ‘ONS/GSS area code’ and click ‘Change labels’. Specify the Output layout to ‘Origin-destination pair list’. Under ‘Output format’ select ‘Comma separated values’. Then, click ‘Preview Output and Download’
Click ‘Download output file’. Change the filename to ‘LiverpoolTTW.csv’ and then click ‘Download now’. Save the file to your working directory (or create a new one).

Downloading counts for areas

You will also download data for MSOAs (i.e., not information on flows, but on the MSOAs themselves). These data will be downloaded from the Infuse website.

Visit the website http://infuse.mimas.ac.uk/ - click on 2011 Census Data
On the InFuse homepage, click on 2011 Census data.Click Topics.
From the list of Topics on the left hand side of the screen, select ‘Economic Activity’.
You should click Select under the first of the 40 topic combinations (Age, Economic activity). Then, click Next. In the Categories window click the box next to Age 16 to 74, then expand Economicaly active. Next, you will need to expand the fields by clicking the required boxes (click - for the fields indicated) and then tick the boxes next to Employee > Part-time, Employee > Full-time, Self-employed with employees > Part-time, Self-employed with employees > Full-time, Self-employed without employees > Part-time, Self-employed without employees > Full-time, Unemployed.

Click Add. You should then have seven category combinations.

Click Next.

In the Geography window expand Local Authorities (click on -). Expand Liverpool and select Middle Super Output Areas and Intermediate Zones (61 areas).

Then, click Add followed by Next.

Click Get the data, followed by Download data.

Save the data to your data directory (e.g., M:/ENVS257/Commuting).When viewed in Excel, the file should have information in columns A to L, with 63 rows.

Importing the area count data into R

Open R. As usual, you must set your working directory.

# For example:
setwd("M:/ENVS257/Commuting/")

Read the file in for use later on (making sure your filename is correctly specified within ‘read.csv’):

#Read csv file, skipping the first two rows:
lpoolemploy <- read.csv("Data_AGE_ECOACT_UNIT.csv", header = FALSE, skip = 2)

Now rename the columns:

colnames(lpoolemploy) <- c("ID","geocode","label","type","typeid","employeePT","employeeFT","selfempwithPT","selfempwithFT","selfempnoPT","selfempnoFT","unemploy")

You need to compute the percentage of people who are unemployed with the sum of employed and unemployed persons as the denominator. The unemployment percentages will be mapped and analysed later.

First, compute the total number of employed people (you will need this later as well as at this stage):

lpoolemploy$employed <- lpoolemploy$employeePT+lpoolemploy$employeeFT+lpoolemploy$selfempwithPT+lpoolemploy$selfempwithFT+lpoolemploy$selfempnoPT+lpoolemploy$selfempnoFT

Then the total employed and unemployed people:

lpoolemploy$total <- lpoolemploy$employed+lpoolemploy$unemploy

Next, compute the percentage unemployed:

lpoolemploy$unemployPC <- (lpoolemploy$unemploy/lpoolemploy$total)*100

Now check the file:

head(lpoolemploy)

##      ID   geocode         label
## 1 11246 E02001347 Liverpool 001
## 2 11247 E02001348 Liverpool 002
## 3 11248 E02001349 Liverpool 003
## 4 11249 E02001350 Liverpool 004
## 5 11250 E02001351 Liverpool 005
## 6 11251 E02001352 Liverpool 006
##                                               type typeid employeePT
## 1 Middle Super Output Areas and Intermediate Zones MSOAIZ        829
## 2 Middle Super Output Areas and Intermediate Zones MSOAIZ        798
## 3 Middle Super Output Areas and Intermediate Zones MSOAIZ        870
## 4 Middle Super Output Areas and Intermediate Zones MSOAIZ        780
## 5 Middle Super Output Areas and Intermediate Zones MSOAIZ        931
## 6 Middle Super Output Areas and Intermediate Zones MSOAIZ        821
##   employeeFT selfempwithPT selfempwithFT selfempnoPT selfempnoFT unemploy
## 1       2086            13            64          78         187      352
## 2       2253            14            37          81         183      326
## 3       2621             6            57          77         228      351
## 4       1883            13            53          60         183      415
## 5       1966            12            50          53         157      533
## 6       1750             1            49          72         165      475
##   NA employed total unemployPC
## 1 NA     3257  3609   9.753394
## 2 NA     3366  3692   8.829902
## 3 NA     3859  4210   8.337292
## 4 NA     2972  3387  12.252731
## 5 NA     3169  3702  14.397623
## 6 NA     2858  3333  14.251425

Saving R commands in a script

You can copy the commands you enter into the R console and then re-run/modify these commands ar a later point. To do this go to File > New File > R script. Then go to File > Save As and give the new file a sensible name such as script. In that case, the file is saved as script.R

Now, as you work through the exercise, you can copy your commands into the script file and then save the script file before ending the session. To run code, you can select code in the script file and then click the Run button.

Downloading area data

To map the data you have already worked with you need boundary files (vector polygon data). Area data can be downloaded from the website http://edina.ac.uk/ukborders/, but the area data you need are provided on VITAL > Learning Resources > Practical > LiverpoolMSOA.zip.

Download the file, unzip the downloaded data, and place the contents inside your working directory.

Importing the flow data into R

If you open the csv file in a text editor such as Wordpad or Notepad, you’ll see there are four header lines. We don’t want to keep any of these. The csv file can be read in as follows (note that ttw = travel to work).

#Read csv file, skipping the first four rows:
lpoolttw <- read.csv("LiverpoolTTW.csv", header = FALSE, skip = 4)

Next, preview the file by looking at the first five rows.

head(lpoolttw)

##          V1        V2  V3
## 1 E02001370 E02001347  15
## 2 E02001404 E02001347   7
## 3 E02001360 E02001347  37
## 4 E02001348 E02001347 161
## 5 E02001365 E02001347  37
## 6 E02001376 E02001347  19

V1 is the code for the Origin MSOA
V2 is the code for the Destination MSOA
V3 is the number of persons who lived in the Origin MSOA and worked in the Destination MSOA in 2011

We will rename these to something more meaningful:

colnames(lpoolttw) <- c("Origin","Destination","FlowOD")

Check that this has worked by again looking at the first five rows of the data.

head(lpoolttw)

##      Origin Destination FlowOD
## 1 E02001370   E02001347     15
## 2 E02001404   E02001347      7
## 3 E02001360   E02001347     37
## 4 E02001348   E02001347    161
## 5 E02001365   E02001347     37
## 6 E02001376   E02001347     19

Next, we will summarise the variables.

summary(lpoolttw)

##        Origin        Destination       FlowOD      
##  E02001347:  61   E02001347:  61   Min.   :  1.00  
##  E02001353:  61   E02001350:  61   1st Qu.:  6.00  
##  E02001354:  61   E02001351:  61   Median : 13.00  
##  E02001359:  61   E02001354:  61   Mean   : 32.57  
##  E02001362:  61   E02001355:  61   3rd Qu.: 30.00  
##  E02001363:  61   E02001358:  61   Max.   :722.00  
##  (Other)  :3272   (Other)  :3272   NA's   :2

The two NA’s indicate rows at the end of the data which were textual information included in the downloaded file. You don’t need to worry about these for now.

Note that the miniumum, 1st quartile and median FlowOD values are 1, 6, and 13 respectively. In other words, most flows are small.

If you want to save your workspace, so you can open it up later on, you can use the save.image function.

# This will save all objects currently loaded in the workspace
save.image(file = "ttw.RData")

After saving, you can re-load the objects contained in this saved file at any time (e.g. after closing R) using the load command. The .RData file is stored in the working directory defined earlier.

# Remember that you will need to set your working directory belocatorfore trying
# to load - otherwise R will not know where to find your RData file
load("ttw.RData")

Importing the MSOA boundaries

As the next step, we need to import the Liverpool MSOAs shapefile (vector polygon boundaries) downloaded earlier.

We will need to make use of functions which are not contained in the base R installation.

You need to load ‘maptools’ and ‘rgeos’. These packages include functions for reading and manipulating GIS data.

# Set working directory
setwd("M:/ENVS257/Commuting/")

#Choose the CRAN mirror where the packages are going to be downloaded from
options("repos" = c(CRAN = "http://www.stats.bris.ac.uk/R/"))

#Download / install and then load sp
install.packages("sp", depend = TRUE, lib = getwd())
library("sp", lib.loc = getwd())

#Load maptools
library("maptools")

#Load stringr
library("stringr")

#Load rgeos
library("rgeos")

Now we will read in the MSOAs data into a type of object called a SpatialPolygonsDataFrame.

lpoolMSOA <- readShapePoly("LiverpoolMSOA.shp")

## Warning: use rgdal::readOGR or sf::st_read

We can see what the SpatialPolygonsDataFrame looks like using a the plot() function.

# A basic plot of the MSOAs SpatialPolygonsDataFrame
plot(lpoolMSOA)

Now we have a map of MSOAs in Liverpool we can consider mapping commuting flows between areas.

First, we need to extract several elements from the LiverpoolMSOA shapefile attribute table. We do this using SpatialPointsDataFrame. To map flows between zones we also need the centroids of the MSOAs, and this is achieved using the coordinates command.

MSOAXY <- SpatialPointsDataFrame(coordinates(lpoolMSOA),data=as(lpoolMSOA, "data.frame")[c("geo_code")])

MSOAXY is then joined to the lpoolttw and to the lpoolemploy data. We need to join these three files twice - once linking geo_code (in MSOAXY) to Destination in lpoolttw to create Distttw1 and then geocode (in lpoolemploy) to Destination (in the new file Distttw1) to creat Distttw2. Then the same steps are repeated for Origin to create Distttw3 and Distttw4.

So, first we’ll join to Destination and then the output from this to Origin:

Distttw1 <- merge(lpoolttw,MSOAXY,by.x="Destination",by.y="geo_code")
Distttw2 <- merge(lpoolemploy,Distttw1,by.x="geocode",by.y="Destination",all=TRUE)
head(Distttw2)

##     geocode    ID         label
## 1 E02001347 11246 Liverpool 001
## 2 E02001347 11246 Liverpool 001
## 3 E02001347 11246 Liverpool 001
## 4 E02001347 11246 Liverpool 001
## 5 E02001347 11246 Liverpool 001
## 6 E02001347 11246 Liverpool 001
##                                               type typeid employeePT
## 1 Middle Super Output Areas and Intermediate Zones MSOAIZ        829
## 2 Middle Super Output Areas and Intermediate Zones MSOAIZ        829
## 3 Middle Super Output Areas and Intermediate Zones MSOAIZ        829
## 4 Middle Super Output Areas and Intermediate Zones MSOAIZ        829
## 5 Middle Super Output Areas and Intermediate Zones MSOAIZ        829
## 6 Middle Super Output Areas and Intermediate Zones MSOAIZ        829
##   employeeFT selfempwithPT selfempwithFT selfempnoPT selfempnoFT unemploy
## 1       2086            13            64          78         187      352
## 2       2086            13            64          78         187      352
## 3       2086            13            64          78         187      352
## 4       2086            13            64          78         187      352
## 5       2086            13            64          78         187      352
## 6       2086            13            64          78         187      352
##   NA employed total unemployPC    Origin FlowOD coords.x1 coords.x2
## 1 NA     3257  3609   9.753394 E02001370     15  338686.2    396978
## 2 NA     3257  3609   9.753394 E02001348    161  338686.2    396978
## 3 NA     3257  3609   9.753394 E02001365     37  338686.2    396978
## 4 NA     3257  3609   9.753394 E02001404      7  338686.2    396978
## 5 NA     3257  3609   9.753394 E02001360     37  338686.2    396978
## 6 NA     3257  3609   9.753394 E02001371     24  338686.2    396978

And now the equivalent for Origin (Ignore the warning message which appears after creating Distttw4):

Distttw3 <- merge(Distttw2,MSOAXY,by.x="Origin",by.y="geo_code")
Distttw4 <- merge(lpoolemploy,Distttw3,by.x="geocode",by.y="Origin",all=TRUE)

## Warning in merge.data.frame(lpoolemploy, Distttw3, by.x = "geocode", by.y =
## "Origin", : column name 'geocode' is duplicated in the result

head(Distttw4)

##     geocode  ID.x       label.x
## 1 E02001347 11246 Liverpool 001
## 2 E02001347 11246 Liverpool 001
## 3 E02001347 11246 Liverpool 001
## 4 E02001347 11246 Liverpool 001
## 5 E02001347 11246 Liverpool 001
## 6 E02001347 11246 Liverpool 001
##                                             type.x typeid.x employeePT.x
## 1 Middle Super Output Areas and Intermediate Zones   MSOAIZ          829
## 2 Middle Super Output Areas and Intermediate Zones   MSOAIZ          829
## 3 Middle Super Output Areas and Intermediate Zones   MSOAIZ          829
## 4 Middle Super Output Areas and Intermediate Zones   MSOAIZ          829
## 5 Middle Super Output Areas and Intermediate Zones   MSOAIZ          829
## 6 Middle Super Output Areas and Intermediate Zones   MSOAIZ          829
##   employeeFT.x selfempwithPT.x selfempwithFT.x selfempnoPT.x selfempnoFT.x
## 1         2086              13              64            78           187
## 2         2086              13              64            78           187
## 3         2086              13              64            78           187
## 4         2086              13              64            78           187
## 5         2086              13              64            78           187
## 6         2086              13              64            78           187
##   unemploy.x NA.x employed.x total.x unemployPC.x   geocode  ID.y
## 1        352   NA       3257    3609     9.753394 E02001350 11249
## 2        352   NA       3257    3609     9.753394 E02001401 11299
## 3        352   NA       3257    3609     9.753394 E02001366 11265
## 4        352   NA       3257    3609     9.753394 E02001376 11275
## 5        352   NA       3257    3609     9.753394 E02001382 11280
## 6        352   NA       3257    3609     9.753394 E02001361 11260
##         label.y                                           type.y typeid.y
## 1 Liverpool 004 Middle Super Output Areas and Intermediate Zones   MSOAIZ
## 2 Liverpool 055 Middle Super Output Areas and Intermediate Zones   MSOAIZ
## 3 Liverpool 020 Middle Super Output Areas and Intermediate Zones   MSOAIZ
## 4 Liverpool 030 Middle Super Output Areas and Intermediate Zones   MSOAIZ
## 5 Liverpool 036 Middle Super Output Areas and Intermediate Zones   MSOAIZ
## 6 Liverpool 015 Middle Super Output Areas and Intermediate Zones   MSOAIZ
##   employeePT.y employeeFT.y selfempwithPT.y selfempwithFT.y selfempnoPT.y
## 1          780         1883              13              53            60
## 2          781         2275              17             105            96
## 3          923         2355              17              64            90
## 4          803         1572              18              62            75
## 5          835         2215              16              92            76
## 6          848         1863              10              45            70
##   selfempnoFT.y unemploy.y NA.y employed.y total.y unemployPC.y FlowOD
## 1           183        415   NA       2972    3387    12.252731     50
## 2           186        232   NA       3460    3692     6.283857     13
## 3           236        582   NA       3685    4267    13.639559     11
## 4           118        635   NA       2648    3283    19.342065      9
## 5           234        200   NA       3468    3668     5.452563      3
## 6           170        412   NA       3006    3418    12.053833      4
##   coords.x1.x coords.x2.x coords.x1.y coords.x2.y
## 1    340083.4    396003.7    338686.2      396978
## 2    342699.6    385411.4    338686.2      396978
## 3    338261.9    392714.7    338686.2      396978
## 4    336835.5    391054.2    338686.2      396978
## 5    341247.0    389565.8    338686.2      396978
## 6    337033.7    393676.8    338686.2      396978

Next, replace the column labels:

colnames(Distttw4) <- c("Origin","OriginID","Originlabel","Origintype","OrigintypeID","OemployeePT","OemployeeFT","OselfempwithPT","OselfempwithFT","OselfempnoPT","OselfempnoFT","Ounemploy","NA","Oemployed","Ototal","OunemployPC","Destination","DestinationID","Destinationlabel","Destinationtype","DestinationtypeID","DemployeePT","DemployeeFT","DselfempwithPT","DselfempwithFT","DselfempnoPT","DselfempnoFT","Dunemploy","NA","Demployed","Dtotal","DunemployPC","FlowOD","Deast","Dnorth","Oeast","Onorth") 
head(Distttw4)

##      Origin OriginID   Originlabel
## 1 E02001347    11246 Liverpool 001
## 2 E02001347    11246 Liverpool 001
## 3 E02001347    11246 Liverpool 001
## 4 E02001347    11246 Liverpool 001
## 5 E02001347    11246 Liverpool 001
## 6 E02001347    11246 Liverpool 001
##                                         Origintype OrigintypeID
## 1 Middle Super Output Areas and Intermediate Zones       MSOAIZ
## 2 Middle Super Output Areas and Intermediate Zones       MSOAIZ
## 3 Middle Super Output Areas and Intermediate Zones       MSOAIZ
## 4 Middle Super Output Areas and Intermediate Zones       MSOAIZ
## 5 Middle Super Output Areas and Intermediate Zones       MSOAIZ
## 6 Middle Super Output Areas and Intermediate Zones       MSOAIZ
##   OemployeePT OemployeeFT OselfempwithPT OselfempwithFT OselfempnoPT
## 1         829        2086             13             64           78
## 2         829        2086             13             64           78
## 3         829        2086             13             64           78
## 4         829        2086             13             64           78
## 5         829        2086             13             64           78
## 6         829        2086             13             64           78
##   OselfempnoFT Ounemploy NA Oemployed Ototal OunemployPC Destination
## 1          187       352 NA      3257   3609    9.753394   E02001350
## 2          187       352 NA      3257   3609    9.753394   E02001401
## 3          187       352 NA      3257   3609    9.753394   E02001366
## 4          187       352 NA      3257   3609    9.753394   E02001376
## 5          187       352 NA      3257   3609    9.753394   E02001382
## 6          187       352 NA      3257   3609    9.753394   E02001361
##   DestinationID Destinationlabel
## 1         11249    Liverpool 004
## 2         11299    Liverpool 055
## 3         11265    Liverpool 020
## 4         11275    Liverpool 030
## 5         11280    Liverpool 036
## 6         11260    Liverpool 015
##                                    Destinationtype DestinationtypeID
## 1 Middle Super Output Areas and Intermediate Zones            MSOAIZ
## 2 Middle Super Output Areas and Intermediate Zones            MSOAIZ
## 3 Middle Super Output Areas and Intermediate Zones            MSOAIZ
## 4 Middle Super Output Areas and Intermediate Zones            MSOAIZ
## 5 Middle Super Output Areas and Intermediate Zones            MSOAIZ
## 6 Middle Super Output Areas and Intermediate Zones            MSOAIZ
##   DemployeePT DemployeeFT DselfempwithPT DselfempwithFT DselfempnoPT
## 1         780        1883             13             53           60
## 2         781        2275             17            105           96
## 3         923        2355             17             64           90
## 4         803        1572             18             62           75
## 5         835        2215             16             92           76
## 6         848        1863             10             45           70
##   DselfempnoFT Dunemploy NA Demployed Dtotal DunemployPC FlowOD    Deast
## 1          183       415 NA      2972   3387   12.252731     50 340083.4
## 2          186       232 NA      3460   3692    6.283857     13 342699.6
## 3          236       582 NA      3685   4267   13.639559     11 338261.9
## 4          118       635 NA      2648   3283   19.342065      9 336835.5
## 5          234       200 NA      3468   3668    5.452563      3 341247.0
## 6          170       412 NA      3006   3418   12.053833      4 337033.7
##     Dnorth    Oeast Onorth
## 1 396003.7 338686.2 396978
## 2 385411.4 338686.2 396978
## 3 392714.7 338686.2 396978
## 4 391054.2 338686.2 396978
## 5 389565.8 338686.2 396978
## 6 393676.8 338686.2 396978

Mapping flows

You will now map flows as lines with thicknesses indicating the size of the flows:

# Plot the MSOAs and then map the flows
plot(lpoolMSOA)
for (i in 1:length(Distttw4[, 1])) {lines(Distttw4 [i, c(36, 34)], Distttw4 [i, c(37, 35)], lwd=0.2 + Distttw4 [i,33]/100, col="blue")}

The map is almost untelligible at present, so we will modify the flows to make the picture clearer. Firstly, we’ll remove the internal flows (people who lived and worked within the same MSOA).

Compute the distances between the Origins and Destinations.
Retain only rows with distance > 0; this removes internal flows. At this stage we create a new file, as we will want the rows with a distance of zero for a later stage.

# Compute the distances between the Origins and Destinations
Distttw4$Dist <- sqrt((Distttw4$Oeast-Distttw4$Deast)**2+(Distttw4$Onorth-Distttw4$Dnorth)**2)
# Retain rows where Dist > 0
Distttw0 <- Distttw4[ which(Distttw4$Dist > 0),]

We will further refine the data to exclude all flows of <= 250

# Retain rows where FlowOD > 250
DistttwCut <- Distttw0[ which(Distttw0$FlowOD > 250),]
# Plot the MSOAs and then map the flows
plot(lpoolMSOA)
for (i in 1:length(DistttwCut[, 1])) {lines(DistttwCut [i, c(36, 34)], DistttwCut [i, c(37, 35)], lwd=0.2 + DistttwCut [i,33]/100, col="blue")}

That’s better — it is now possible to discern some major trends in commuting across Liverpool. Try some variants other flow cutoffs (where DistttwCutTest is the new (output) file and Distttw0 is the input).

Linking, mapping and analysing employment data

Next we will map the percentage of unemployed people. Firstly, you need to join the employment data to the MSOA areas so that the employment counts and percentages can be mapped:

lpoolMSOA@data=data.frame(lpoolMSOA@data, lpoolemploy[match(lpoolMSOA@data$geo_code, lpoolemploy$geocode),])

Now we’ll map the percentage of people who were unemployed in 2011 (unemployPC) and add a legend to the map. At this stage, we need two new libraries - RColorBrewer and classInt.

#Download / install and then load RColorBrewer
install.packages("RColorBrewer", depend = TRUE, lib = getwd())
library("RColorBrewer", lib.loc = getwd())
#Download / install and then load RColorBrewer
library("classInt")

We select a class interval scheme to display the values, and add a legend.

fj5 <- classIntervals(lpoolMSOA$unemployPC, n=5, style="fisher")
pal <- grey.colors(4, 0.95, 0.55, 2.2)
fj5Colours <- findColours(fj5, pal)
plot(lpoolMSOA, col=fj5Colours, pch=19)
legend("topleft", fill=attr(fj5Colours, "palette"),legend=names(attr(fj5Colours, "table")), bty="n")

To map rates, it’s useful to be able to round the values for inclusion in the legend. The first line of the code above can be modified to do this; we can also change the “topleft” element under the legend to more precisely position the legend (obviously, changing the x and y coordinates moves the legend); you will also add a scale bar and a north arrow:

fj5 <- classIntervals(round(lpoolMSOA$unemployPC,digits=2), n = 5, style = "fisher")
pal <- grey.colors(4, 0.95, 0.55, 2.2)
fj5Colours <- findColours(fj5, pal)
plot(lpoolMSOA, col=fj5Colours, pch=19)
legend(x = 329500, y = 387500, fill=attr(fj5Colours, "palette"),legend=names(attr(fj5Colours, "table")), bty="n") 
# Add a scale bar
SpatialPolygonsRescale(layout.scale.bar(), offset = c(331000, 381500), scale = 4000, fill = c("white", "black"), plot.grid = F)
text(331000, 381000, "0", cex = 0.6)
text(333000, 381000, "2", cex = 0.6)
text(335000, 381000, "4km", cex = 0.6)
# Add a North arrow
SpatialPolygonsRescale(layout.north.arrow(1), offset= c(344000,395000), scale = 2000, plot.grid=F)

The six figure numbers are eastings and northings and these can be changed with the result that the position of the features they refer to change. To determine the coordinates of a location type:

locator()

Click on the plot at the point for which you want locational information. Then click the ESC key. The coordinates of the location will then appear in the R console.

Note that you can export the lpoolMSOA file into shapefile format which can then be opened in QGIS. This is done with:

writeSpatialShape(lpoolMSOA,"lpmsoa")

## Warning: use rgdal::readOGR or sf::st_read

## Warning: use rgdal::readOGR or sf::st_read

This creates the output shapefile ‘lpmsoa’. If you prefer to use QGIS to make maps of MSOA area features you can do so following gudiance provided in the previous practical exercises.

One useful way of exploring the spatial patterning in data values (here, unemployment percentages) is to compute a statistic called the Moran’s I spatial autocorrelation coefficient. Positive values of I indicate clustering of similar values, negative values of I indicate clustering of dissimilar values and values close to zero indicate zero spatial autocorrelation (a ‘random’ spatial pattern).

First, we need to install and load the package ‘spdep’:

#Download / install and then load spdep
install.packages("spdep")
library("spdep")

Next we need to create list indicating which MSOAs share boundaries with which other MSOAs. This is done using the command ‘poly2nb’. This must then be converted into a weights file using ‘nblistw’. The weights file represents the spatial relationship between MSOAs and if an MSOA borders five other MSOAs then each neighbouring MSOA is given a weight of 0.2 since the weights for each set of neighbours of each MSOA sum to one and 1/5 = 0.2. With this weights file we can measure the average difference between unemployment percentages at each MSOA and at the neighbouring MSOAs. A value of I close to one would indicate that the unemployment percentage values at each MSOA tend to be similar to the unemployment percentage values at other MSOAs with which they share a border - thus, the values cluster.

MSOA.nb <- poly2nb(lpoolMSOA)
MSOA.wt <- nb2listw(MSOA.nb)
MSOA.wt

## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 61 
## Number of nonzero links: 302 
## Percentage nonzero weights: 8.116098 
## Average number of links: 4.95082 
## 
## Weights style: W 
## Weights constants summary:
##    n   nn S0       S1       S2
## W 61 3721 61 26.82893 251.0616

#Summarise the weights values
summary(unlist(MSOA.wt$weights))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1250  0.1667  0.1667  0.2020  0.2500  1.0000

Using the new weights file we can now compute I:

moransUnemploy <- moran.test(lpoolMSOA$unemployPC, MSOA.wt)
moransUnemploy

## 
##  Moran I test under randomisation
## 
## data:  lpoolMSOA$unemployPC  
## weights: MSOA.wt    
## 
## Moran I statistic standard deviate = 5.9407, p-value = 1.419e-09
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.466154721      -0.016666667       0.006605424

The value of 0.466 indicates a moderate level of spatial clustering - that is, neighbouring unemployment percentages tend to be quite similar on average. The p-value of 1.419e-09 indicates that the clustering is highly significant (that is, it is unlikely to have occurred by chance) - values closer to zero indicate a higher significance level.

Values such as 1.419e-09 may not mean much to you. This is scientific notation and is a convenient way of displaying very small or very large values. In the case of 1.419e-09, this means that the decimal place is moved 9 positions to the left, so 1.419e-09 = 0.000000001419.

Exploring the determinants of commuting

Next, we are going to explore how the distance between MSOAs and their populations are related to the size of commuting flows. We might expect flows to be larger between MSOAs which are close together. Also flows to MSOAs which have large populations (and, by implications, more housing) may also be large. The relationships between variables such as flow size and distance can be analysed using regression. The regression approach you may be familiar with from Excel, for example, is called ordinary least squares (OLS) regression. OLS regression is used to explore how a set of variables (‘independent variables’, for example, population size of MSOAs and distance between MSOAs) are related to a single variable (for example, flows between MSOAs)

OLS is used here to explore how flows relate to distance, origin populations (number of employed persons), and destination populations.

We run the OLS model with intra-MSOA (internal) flows removed. Research on commuting (and migration) flows suggests that we should transform the variables using logs (that is, they are put on a log scale; zeros can’t be logged, so we add 0.5 to each value):

lreg_ttw <- lm(log(FlowOD+0.5) ~ log(Dist+0.5) + log(Oemployed+0.5) + log(Demployed+0.5), data = Distttw0)
summary(lreg_ttw)

## 
## Call:
## lm(formula = log(FlowOD + 0.5) ~ log(Dist + 0.5) + log(Oemployed + 
##     0.5) + log(Demployed + 0.5), data = Distttw0)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7790 -0.7238 -0.0659  0.5940  3.2668 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           4.67385    0.99651   4.690 2.83e-06 ***
## log(Dist + 0.5)      -0.86549    0.03134 -27.619  < 2e-16 ***
## log(Oemployed + 0.5)  0.91814    0.08527  10.768  < 2e-16 ***
## log(Demployed + 0.5) -0.25642    0.08485  -3.022  0.00253 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.089 on 3571 degrees of freedom
## Multiple R-squared:  0.1972, Adjusted R-squared:  0.1965 
## F-statistic: 292.4 on 3 and 3571 DF,  p-value: < 2.2e-16

The estimate values (parameter estimates) for Dist, Oemployed and Oemployed are the key figures here.

The value for Dist is negative, while the value for Oemployed is positive and that for Demployed is negative. This indicates that Dist (distance) has a negative effect - as distance increases, flows decrease. This fits with what we’d expect. The positive value for Oemployed also corresponds to what we’d expect - commuting flows are likely to be larger FROM MSOAs with large populations. The negative value for Demployed seems less intuitive until we note that the employment data we are using are for area of residence and NOT area of workplace. Hence, there may not be a ‘pull factor’ for areas with large numbers of employed residents - instead the ‘pull’ is likely to be towards areas with large numbers of jobs. It is possible to instead assess flows by MSOA of workplace (the area in which people work) rather than (as here) the MSOA of residence (where people live), but will work only with residence data in this exercise.

The astrices (** or ***) at the end of the columns indicate that all of the parameter estimates are statistically significant.

But, there is a problem with using OLS… Flows are counts which are unlikely to have a normal distribution - there isn’t an equal proportion of very small and very large flows, with the bulk in the middle range. Instead, there are many small flows and few large flows. OLS is designed to work with data which have a normal distribution, and so its application to count data like commuting flows is not sensible… An alternative approach is Poisson regression, which is appropriate for flow data. We can apply it using the glm function. Again, we exclude internal flows.

preg_ttw <- glm(FlowOD ~ Dist + Oemployed + Demployed, data = Distttw0, family = poisson())
summary(preg_ttw)

## 
## Call:
## glm(formula = FlowOD ~ Dist + Oemployed + Demployed, family = poisson(), 
##     data = Distttw0)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -10.316   -4.861   -2.780    0.079   49.810  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  4.262e+00  2.206e-02  193.21   <2e-16 ***
## Dist        -1.520e-04  1.269e-06 -119.76   <2e-16 ***
## Oemployed    2.953e-04  5.139e-06   57.46   <2e-16 ***
## Demployed   -3.459e-04  4.938e-06  -70.06   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 171205  on 3574  degrees of freedom
## Residual deviance: 147039  on 3571  degrees of freedom
## AIC: 162888
## 
## Number of Fisher Scoring iterations: 6

As for OLS, the parameter estimate for Dist is negative, that for Oemployed is positive and again that for Demployed is negative.

There are still problems with applying Poisson regression but it is conceptually an improvement on OLS regression.

We know that data on small flows are unreliable. So, try re-running the Poisson regression model with different (small) flow thresholds (following the steps outlined for mapping flows). Use thresholds of 5 and 10 and compare results to those using all data (that is, a threshold of zero). As a reminder, this can be done with (for the example of a cut-off of 5 people):

DistttwCut <- Distttw0[ which(Distttw0$FlowOD > 5),]
preg_ttw5 <- glm(FlowOD ~ Dist + Oemployed + Demployed, data = DistttwCut, family = poisson())
summary(preg_ttw5)

## 
## Call:
## glm(formula = FlowOD ~ Dist + Oemployed + Demployed, family = poisson(), 
##     data = DistttwCut)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -10.950   -4.953   -2.910    0.152   49.087  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  4.584e+00  2.216e-02  206.92   <2e-16 ***
## Dist        -9.288e-05  1.282e-06  -72.47   <2e-16 ***
## Oemployed    1.970e-04  5.249e-06   37.53   <2e-16 ***
## Demployed   -3.718e-04  4.964e-06  -74.89   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 127232  on 2673  degrees of freedom
## Residual deviance: 114571  on 2670  degrees of freedom
## AIC: 127848
## 
## Number of Fisher Scoring iterations: 6

If you want to add additional variables (perhaps downloaded from Infuse), you could do this as follows (for a file named ‘lpoolNEW’ with the same format as the employment data and with only two sets of counts):

colnames(lpoolNEW) <- c(“DID”,“Dgeocode”,“Dlabel”,“Dtype”,“Dtypeid”,“Dcount1”,“Dcount2”)

Distttw5 <- merge(lpoolNEW,Distttw0,by.x=“geocode”,by.y=“Destination”,all=TRUE)

Then rename to indicate origins and join to the new Disttw5 file created above:

colnames(lpoolNEW) <- c(“OID”,“Ogeocode”,“Olabel”,“Otype”,“Otypeid”,“Ocount1”,“Ocount2”)

Distttw6 <- merge(lpoolNEW,Distttw5,by.x=“geocode”,by.y=“Origin”,all=TRUE)

Giving the final file Distttw6 (for Dist > 0) containing the flows, distances, employment data and the new counts.

What to do when starting or stopping an RStudio session

At the end of a session make sure you have daved your data:

save.image(file = "ttw.RData")

When starting a new session, set your working directory and open your data:

setwd("M:/ENVS257/Commuting/")
load("ttw.RData")

You can open your script file (containing your pasted commands) with File > Open and then locate the script (.R) file.

You will also need to reload the libraries - a sensible approach is to copy the library-related code (e.g., install.packages… and library…) into a section of your script file and then run these lines of code after the setwd and load lines detailed above. Then you can re-run any other lines of code (e.g., map creation, statistics, etc).

Note that maps themselves are not saved - the underlying data are and, if you save the code in a script file, then you can simply re-map the existing data by re-running the appropriate code.

Summary

The various analysis that you have prepared during this tutorial will be of assistance when compiling your report. You may have to piece together different parts of the code to achieve this, however, examples of any code that you need to complete your report are contained in this document. However, you may wish to make use of the knowledge developed in previous practicals.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.