Migration and flow mapping

Introduction

In this exercise, you will explore migration in England between 2000 and 2001. You will do this using two kinds of data:

Area data (numbers of people who moved into, or out of, an area between 2000 and 2001)
Flow (or interaction) data

Flow data represent movements of people, information, resources, etc between places. For some background to the kinds of flow data we will be working with, see:

http://census.ukdataservice.ac.uk/get-data/flow-data.aspx

Project Tasks

You will write a short executive report (max 500 words) that will answer the following questions:

What are the main migration patterns in England?
Where are flows largest? What might this suggest?
How important are distance, origin population, and destination population in explaining migration flows between the districts of England? How do changes in flow thesholds affect the results?

In addition to answering these two questions in your report, you should present as a minimum the following information:

A map of in-migration rates and a map of out-migration rates.
A map showing migration flows between the districts of England (with your own flow thresholds).
A table summarising Poisson regression outputs for flow thresholds of 0 and 10.

Overall, your report should not have more than 10 maps or charts. All maps must be generated within R. You should make use of the knowledge you have developed in previous practical exercises. This may enable you explore the data in different ways or to produce better maps which follow standard cartographic conventions.

Downloading flow data

We will first download some interaction data from the CIDER (Centre for Interaction Data Estimation and Research) website.

Visit the CIDER website: http://cider.census.ac.uk/
Click on wicid - the flexible query builder. Then click Login using Shibboleth. Login with University of Liverpool. Enter your University of Liverpool username and password. Click Continue to the Flow Data.
Click on the Data tab. Click 'Select by dataset and table'. Select 'Migration data', then select 'Census 2000-01'.
We are going to look at Migration across the districts of England. So, select '2001 SMS Level 1' (District level migration data from the 2001 Census)
Select Table 1 'Age by sex'. Click on box '1' (Total, Total) - we are selecting only the total population. Click 'Add selected cells'.
Click the wicid tab and click the Data tab again. Under 'Other data' click the 'Population data' link and select 'Search for population data sets matching your selected interaction data'. Then, select '2001 SMS Level 1 PAR' and 'Select Matching Items'.
Click on the Geography tab. Select 'Select or edit origins'. Click on 'Quick selection'. Select '2001 GB Districts plus Northern Ireland'. Click 'Confirm that you wish to proceed using this geography'.
Click 'Switch to selection of destinations'. Click 'Copy selection' and then click 'Set the destinations to be the same as the currently selected origins'.
Click 'produce output'. Click Run, then Continue to Output pages.
Select 'Tabular output'. Specify the Output layout to 'Origin-destination pair list'. Under 'Origin and destination labels' select Change. Change the 1st label setting to 'District Code' and click 'Change labels'. Under 'Output format' select 'Comma separated values'. Then, click 'Preview Output and Download'
Click 'Download output file' and 'Download now'. Save the file to your working directory (or create a new one) - keep the default file name.

Downloading area data

Next, we will download 2001 district boundaries (as a shapefile) for England.

Visit the website http://edina.ac.uk/census/ - click LOGIN via UK federation; login with University of Liverpool. Enter your University of Liverpool username and password (if requested). Click the 'Easy Download' link. Under 'England' > '2001 boundaries' select 'English District 2001 (includes UAs and London Boroughs)'.
Under 'English Districts 2001' select the option next to Arcview SHAPE: 'zip (39.05 Mb)'.
Unzip the downloaded map, and place the contents inside your working directory

Importing the flow data into R

Our first task in R will be to import the flow data.

Open R. As usual, you must set your working directory.

# For example:
setwd("M:/ENVS257/Migration/")

If you open the csv file in a text editor such as Wordpad or Notepad, you'll see there are four header lines. We don't want to keep any of these. The csv file can be read in as follows.

# Read csv file, skipping the first four rows:
mig2001 <- read.csv("wicid_output.csv", header = FALSE, skip = 4)

Next, preview the file by looking at the first five rows.

head(mig2001)

##     V1   V2  V3   V4     V5
## 1 00AA 00AA 154 7181   7181
## 2 00AA 00AB   0 7181 163929
## 3 00AA 00AC   9 7181 314559
## 4 00AA 00AD   3 7181 218310
## 5 00AA 00AE   9 7181 263463
## 6 00AA 00AF  15 7181 295526

The data labels are as follows:

V1: Code for the Origin district
V2: Code for the Destination district
V3: Number of persons who lived in the Origin district in 2000 and had moved to the Destination district by 2001
V4: Number of persons who lived in the Origin district in 2001
V5: Number of persons who lived in the Destination district in 2001

We will rename these to something more meaningful:

colnames(mig2001) <- c("Origin", "Destination", "FlowOD", "PersonsO", "personsD")

Check that this has worked by again looking at the first five rows of the data.

head(mig2001)

##   Origin Destination FlowOD PersonsO personsD
## 1   00AA        00AA    154     7181     7181
## 2   00AA        00AB      0     7181   163929
## 3   00AA        00AC      9     7181   314559
## 4   00AA        00AD      3     7181   218310
## 5   00AA        00AE      9     7181   263463
## 6   00AA        00AF     15     7181   295526

Next, we will summarise the variables.

summary(mig2001)

##      Origin        Destination         FlowOD          PersonsO      
##  00AA   :   409   00AA   :   409   Min.   :     0   Min.   :   2136  
##  00AB   :   409   00AB   :   409   1st Qu.:     0   1st Qu.:  85906  
##  00AC   :   409   00AC   :   409   Median :     3   Median : 112970  
##  00AD   :   409   00AD   :   409   Mean   :    37   Mean   : 143728  
##  00AE   :   409   00AE   :   409   3rd Qu.:     9   3rd Qu.: 165693  
##  00AF   :   409   00AF   :   409   Max.   :127999   Max.   :1681152  
##  (Other):164829   (Other):164829   NA's   :2        NA's   :2        
##     personsD      
##  Min.   :   2136  
##  1st Qu.:  85906  
##  Median : 112970  
##  Mean   : 143728  
##  3rd Qu.: 165693  
##  Max.   :1681152  
##  NA's   :2

The two NA's indicate rows at the end of the data which were textual information included in the downloaded file. You don't need to worry about these for now.

Note that the miniumum, 1st quartile and median FlowOD values are 0, 0, and 3 respectively. In 2001, Census data were modified using a small adjustment procedure which randomly converted all values of 0, 1, 2 or 3 to values of 0 or 3. So, there are no values of 1 or 2. This procedure was intended to help protect confidentiality of individuals. But, it means that smaller flows are of limited value…

If you want to save your workspace, so you can open it up later on, you can use the save.image function.

# This will save all objects currently loaded in the workspace
save.image(file = "myfile.RData")

After saving, you can re-load the objects contained in this saved file at any time (e.g. after closing R) using the load command. The .RData file is stored in the working directory defined earlier.

# Remember that you will need to set your working directory before trying to
# load - otherwise R will not know where to find you RData file
load("myfile.RData")

The flow data file includes values for the whole of Britain and for Northern Ireland, but we will generating maps which refer only to England.

Downloading migration data for areas

We will also be working with migration data for areas. These data can be downloaded from CASWEB (http://casweb.mimas.ac.uk/; 2001 districts and unitary authorities Table KS024), but they have already been downloaded for you and are available through VITAL (See Practicals directory).

# Read csv file:
mig2001area <- read.csv("distmig2001.csv", header = TRUE)

Preview the file by looking at the first five rows.

head(mig2001area)

##   Zone.Code            Zone.Name ks0240001 ks0240002 ks0240003 ks0240004
## 1      00AA       City of London      7185      1652      1065       308
## 2      00AB Barking and Dagenham    163944     16241      6971       789
## 3      00AC               Barnet    314564     42902     18560      5576
## 4      00AD               Bexley    218307     18376      7873       642
## 5      00AE                Brent    263464     38356     14793      5633
## 6      00AF              Bromley    295532     29449     12590      1711
##   ks0240005 ks0240006 ks0240007
## 1       122       157      1022
## 2      1637      6844      7064
## 3      3598     15168     20311
## 4      1337      8524      8727
## 5      4732     13198     18317
## 6      2025     13123     13266

Mapping districts

As the next step, we need to import the England districts shapefile downloaded earlier.

As in previous practicals, we will need to make use of functions which are not contained in the base R installation.

You need to load 'maptools' and 'rgeos'; these packages include functions for reading and manipulating GIS data.

# Choose the CRAN mirror where the packages are going to be downloaded from
options(repos = c(CRAN = "http://www.stats.bris.ac.uk/R/"))

# Download / install and then load sp
install.packages("sp", depend = TRUE, lib = getwd())
library("sp", lib.loc = getwd())

# Download / install and then load maptools
install.packages("maptools", depend = TRUE, lib = getwd())
library(maptools, lib.loc = getwd())

# Download / install and then load stringr
install.packages("stringr", depend = TRUE, lib = getwd())
library("stringr", lib.loc = getwd())

# Download / install and then load rgeos
install.packages("rgeos", depend = TRUE, lib = getwd())
library("rgeos", lib.loc = getwd())

Now we will read in the districts data into a type of object called a SpatialPolygonsDataFrame.

Districts <- readShapePoly("England_dt_2001_area.shp")

We can see what the SpatialPolygonsDataFrame looks like using the plot() function.

# A basic plot of the Districts SpatialPolygonsDataFrame
plot(Districts)

plot of chunk unnamed-chunk-16

Now we have a map of districts in England we can consider mapping migration flows between districts.

First, we need to derive some information from the Districts shapefile. We do this using SpatialPointsDataFrame. To map flows between zones we also need the centroids of the districts, and this is achieved using the coordinates command.

DistrictsXY <- SpatialPointsDataFrame(coordinates(Districts), data = as(Districts, 
    "data.frame")[c("LABEL")])

DistrictsXY is then joined to the mig2001 data. We need to join these two files twice - once linking LABEL (in DistrictsXY) to Origin in Mig2001 and once to Destination in Mig2001. First we'll join to Destination and then the output from this will be joined to Origin.

DistMig <- merge(mig2001, DistrictsXY, by.x = "Destination", by.y = "LABEL")
head(DistMig)

##   Destination Origin FlowOD PersonsO personsD coords.x1 coords.x2
## 1        00AA   33UF      0    98375     7181    532464    181220
## 2        00AA   00PM      0    90940     7181    532464    181220
## 3        00AA   32UC      0   130453     7181    532464    181220
## 4        00AA   31UG      0    47839     7181    532464    181220
## 5        00AA   20UB      0    53683     7181    532464    181220
## 6        00AA   00JA      0   156059     7181    532464    181220

colnames(DistMig) <- c("Destination", "Origin", "FlowOD", "PersonsO", "PersonsD", 
    "DestinationX", "DestinationY")
head(DistMig)

##   Destination Origin FlowOD PersonsO PersonsD DestinationX DestinationY
## 1        00AA   33UF      0    98375     7181       532464       181220
## 2        00AA   00PM      0    90940     7181       532464       181220
## 3        00AA   32UC      0   130453     7181       532464       181220
## 4        00AA   31UG      0    47839     7181       532464       181220
## 5        00AA   20UB      0    53683     7181       532464       181220
## 6        00AA   00JA      0   156059     7181       532464       181220

DistMigF <- merge(DistMig, DistrictsXY, by.x = "Origin", by.y = "LABEL")
colnames(DistMigF) <- c("Origin", "Destination", "FlowOD", "PersonsO", "PersonsD", 
    "DestinationX", "DestinationY", "OriginX", "OriginY")
head(DistMigF)

##   Origin Destination FlowOD PersonsO PersonsD DestinationX DestinationY
## 1   00AA        00MS      0     7181   217450       442363       113232
## 2   00AA        00FN      0     7181   279921       459075       304818
## 3   00AA        00JA      3     7181   156059       518388       302630
## 4   00AA        20UD      0     7181    85074       412505       548294
## 5   00AA        43UB      3     7181   121936       512049       163233
## 6   00AA        22UH      3     7181   120884       548555       203312
##   OriginX OriginY
## 1  532464  181220
## 2  532464  181220
## 3  532464  181220
## 4  532464  181220
## 5  532464  181220
## 6  532464  181220

Mapping migration rates

Now we will calculate in- and -out migration rates and map them. You have already downloaded a district shapefile, but we will need a cleaned version of this for the next stage of the analysis. The cleaned file is available through VITAL (Exercise 4 directory - download and unzip the file 'DistrictsUN.zip' into your working directory). The shapefile is imported as before.

DistrictsUN <- readShapePoly("DistrictsUN.shp")

DistrictsUN@data = data.frame(DistrictsUN@data, mig2001area[match(DistrictsUN@data$LABEL, 
    mig2001area$Zone.Code), ])

The data labels are as follows:

ks0240001: All people
ks0240002: People who are migrants
ks0240003: People who moved into the area from within the UK
ks0240004: People who moved into the area from outside the UK
ks0240005: People with no usual address one year before Census
ks0240006: People who moved within the area
ks0240007: People who moved out of the area

Now we'll map the number of people who are migrants (ks0240002) and add a legend to the map. At this stage, we need two new libraries - RColorBrewer and classInt.

# Download / install and then load RColorBrewer
install.packages("RColorBrewer", depend = TRUE, lib = getwd())
library("RColorBrewer", lib.loc = getwd())
# Download / install and then load RColorBrewer
install.packages("classInt", depend = TRUE, lib = getwd())
library("classInt", lib.loc = getwd())

We select class interval scheme to display the values, and add a legend.

fj5 <- classIntervals(DistrictsUN$ks0240002, n = 5, style = "fisher")
pal <- grey.colors(4, 0.95, 0.55, 2.2)
fj5Colours <- findColours(fj5, pal)
plot(DistrictsUN, col = fj5Colours, pch = 19)
legend("topleft", fill = attr(fj5Colours, "palette"), legend = names(attr(fj5Colours, 
    "table")), bty = "n")

plot of chunk unnamed-chunk-23

Clearly, this map isn't of much use for exploring migration - we need to compute rates. As an example, in-migration rates are computed with:

DistrictsUN@data$inMig <- (DistrictsUN@data$ks0240003 + DistrictsUN@data$ks0240004)/DistrictsUN@data$ks0240001

Map the in-migration rates. Next, compute the out-migration rates (with ks0240007 as the numerator). You could also try mapping internal flows as a rate.

To map rates, it's useful to be able to round the values for inclusion in the legend. The first line of the code above can be modified to do this; we can also change the “topleft” element under the legend to more precisely position the legend (obviosuly, changing the x and y coordinates moves the legend); you will also add a scale bar and a north arrow:

fj5 <- classIntervals(round(DistrictsUN$inMig, digits = 2), n = 5, style = "fisher")
pal <- grey.colors(4, 0.95, 0.55, 2.2)
fj5Colours <- findColours(fj5, pal)
plot(DistrictsUN, col = fj5Colours, pch = 19)
legend(x = -25000, y = 386096, fill = attr(fj5Colours, "palette"), legend = names(attr(fj5Colours, 
    "table")), bty = "n")
# Add a scale bar
SpatialPolygonsRescale(layout.scale.bar(), offset = c(0, 1e+05), scale = 1e+05, 
    fill = c("white", "black"), plot.grid = F)
text(0, 125000, "0", cex = 0.6)
text(50000, 125000, "50", cex = 0.6)
text(1e+05, 125000, "100km", cex = 0.6)
# Add a North arrow
SpatialPolygonsRescale(layout.north.arrow(1), offset = c(80000, 410000), scale = 50000, 
    plot.grid = F)

plot of chunk unnamed-chunk-25

Mapping flows

Now we will map the flow data on top of the Districts. One way of doing this is to map the flows between districts as lines, with thicknesses indicating the size of the flows. Including all flows produces a very confusing map.

The internal flows (people who migrated within the same district) and small inter-district flows make it very difficult to see what is going on. So, we need to remove the internal flows and cut out smaller flows to give a clearer picture of migration patterns:

Retain only rows with OriginX >= 0 & DestinationX >= 0 & OriginY >= 0 & DestinationY >= 0; this removes all districts outside of England.
Compute the distances between the Origins and Destinations.
Retain only rows with distance > 0; this removes internal flows. At this stage we create a new file, as we will want the rows with a distance of zero for a later stage.

# Retain rows with X & Y coordinates > 0
DistMigF2 <- DistMigF[which(DistMigF$OriginX > 0 & DistMigF$DestinationX > 0 & 
    DistMigF$OriginY > 0 & DistMigF$DestinationY > 0), ]
# Compute the distances between the Origins and Destinations
DistMigF2$Dist <- sqrt((DistMigF2$OriginX - DistMigF2$DestinationX)^2 + (DistMigF2$OriginY - 
    DistMigF2$DestinationY)^2)
# Retain rows where Dist > 0
DistMigFz <- DistMigF2[which(DistMigF2$Dist > 0), ]

# Plot the districts and then map the flows
plot(Districts)
for (i in 1:length(DistMigFz[, 1])) {
    lines(DistMigFz[i, c(6, 8)], DistMigFz[i, c(7, 9)], lwd = 0.2 + DistMigFz[i, 
        3]/100, col = "blue")
}

plot of chunk unnamed-chunk-27

The map is still not very helpful, so we will refine it to exclude all flows of <= 100, and we'll reduce the impact of the largest flows through dividing by 1000, rather than 100.

# Retain rows where FlowOD > 100
DistMigCut <- DistMigFz[which(DistMigFz$FlowOD > 100), ]
# Plot the districts and then map the flows
plot(Districts)
for (i in 1:length(DistMigCut[, 1])) {
    lines(DistMigCut[i, c(6, 8)], DistMigCut[i, c(7, 9)], lwd = 0.2 + DistMigCut[i, 
        3]/1000, col = "blue")
}

plot of chunk unnamed-chunk-28

# Check all seems ok
summary(DistMigCut)

##      Origin      Destination       FlowOD         PersonsO     
##  15UE   : 114   15UF   :  96   Min.   :  101   Min.   : 24443  
##  15UF   :  96   00HG   :  69   1st Qu.:  139   1st Qu.: 92502  
##  35UC   :  72   15UE   :  66   Median :  227   Median :140024  
##  00CN   :  62   15UC   :  57   Mean   :  612   Mean   :183972  
##  00HG   :  48   35UC   :  56   3rd Qu.:  443   3rd Qu.:238638  
##  00MW   :  45   00MR   :  54   Max.   :22076   Max.   :977085  
##  (Other):3214   (Other):3253                                   
##     PersonsD       DestinationX     DestinationY       OriginX      
##  Min.   :  7181   Min.   :132023   Min.   : 17580   Min.   :132023  
##  1st Qu.: 92502   1st Qu.:388454   1st Qu.:152000   1st Qu.:388211  
##  Median :134248   Median :459163   Median :189463   Median :455413  
##  Mean   :179012   Mean   :441777   Mean   :236789   Mean   :439108  
##  3rd Qu.:218344   3rd Qu.:526281   3rd Qu.:326032   3rd Qu.:526129  
##  Max.   :977085   Max.   :647811   Max.   :638991   Max.   :647811  
##                                                                     
##     OriginY            Dist       
##  Min.   : 17580   Min.   :   124  
##  1st Qu.:155359   1st Qu.: 14155  
##  Median :189643   Median : 22232  
##  Mean   :238852   Mean   : 34166  
##  3rd Qu.:329548   3rd Qu.: 35563  
##  Max.   :638991   Max.   :408510  
##

That's better! It is now possible to discern some major trends in migration across England.

Try some other flow cutoffs (where DistMigCutTest is the new (output) file and DistMigFz is the input). For example, mapping flows of greater than 1000 people:

DistMigCutTest <- DistMigFz[which(DistMigFz$FlowOD > 1000), ]
plot(Districts)
for (i in 1:length(DistMigCutTest[, 1])) {
    lines(DistMigCutTest[i, c(6, 8)], DistMigCutTest[i, c(7, 9)], lwd = 0.2 + 
        DistMigCutTest[i, 3]/1000, col = "blue")
}

plot of chunk unnamed-chunk-29

Exploring the determinants of migration

Next, we are going to explore how the distance between districts and their populations are related to the size of migration flows. We might expect flows to be larger between districts which are close together. Also flows to districts which have large populations (and, by implications, more housing) may also be large. The relationships between variables such as flow size and distance can be analysed using regression. The regression approach you may be familiar with from Excel, for example, is called ordinary least squares (OLS) regression. OLS regression is used to explore how a set of variables ('independent variables', for example, population size of districts and distance between districts) are related to a single variable (for example, flows between districts)

OLS is used here to explore how flows relate to distance, origin populations, and destination populations.

We run the OLS model with intra-district (internal) flows removed. Research on migration flows suggests that we should transform the variables using logs (that is, they are put on a log scale; zeros can't be logged, so we add 0.5 to each value):

lreg_dist <- lm(log(FlowOD + 0.5) ~ log(Dist + 0.5) + log(PersonsO + 0.5) + 
    log(PersonsD + 0.5), data = DistMigFz)
summary(lreg_dist)

## 
## Call:
## lm(formula = log(FlowOD + 0.5) ~ log(Dist + 0.5) + log(PersonsO + 
##     0.5) + log(PersonsD + 0.5), data = DistMigFz)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.532 -1.031  0.142  0.992  6.414 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -0.94920    0.11775   -8.06  7.6e-16 ***
## log(Dist + 0.5)     -1.01691    0.00458 -222.25  < 2e-16 ***
## log(PersonsO + 0.5)  0.65657    0.00561  116.94  < 2e-16 ***
## log(PersonsD + 0.5)  0.59166    0.00561  105.38  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.32 on 151706 degrees of freedom
## Multiple R-squared:  0.375,  Adjusted R-squared:  0.375 
## F-statistic: 3.04e+04 on 3 and 151706 DF,  p-value: <2e-16

The estimate values (parameter estimates) for Dist, PersonsO and PersonsD are the key figures here.

The value for Dist is negative, while the values for PersonsO and PersonsD are positive. This indicates that Dist (distance) has a negative effect - as distance increases, flows decrease. This fits with what we'd expect. The positive values for PersonsO and PersonsD also correspond to what we'd expect - migration flows are likely to be larger FROM districts with large populations, and they are likely to be larger TO districts with large populations. In short, there is a push and pull factor in terms of population size. The three astrices (***) at the end of the columns indicate that all of the parameter estimates are statistically significant.

But, there is a problem with using OLS… Flows are counts which are unlikely to have a normal distribution - there isn't an equal proportion of very small and very large flows, with the bulk in the middle range. Instead, there are many small flows and few large flows. OLS is designed to work with data which have a normal distribution, and so its application to count data like migration flows is not sensible… An alternative approach is Poisson regression, which is appropriate for flow data. We can apply it using the glm function. Again, we exclude internal flows.

preg_distgtzero <- glm(FlowOD ~ Dist + PersonsO + PersonsD, data = DistMigFz, 
    family = poisson())
summary(preg_distgtzero)

## 
## Call:
## glm(formula = FlowOD ~ Dist + PersonsO + PersonsD, family = poisson(), 
##     data = DistMigFz)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
##  -24.2    -3.7    -0.7     2.4   377.5  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  5.19e+00   1.36e-03    3810   <2e-16 ***
## Dist        -2.11e-05   1.08e-08   -1940   <2e-16 ***
## PersonsO     1.64e-06   4.24e-09     386   <2e-16 ***
## PersonsD     1.43e-06   4.44e-09     323   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 19039624  on 151709  degrees of freedom
## Residual deviance: 10947342  on 151706  degrees of freedom
## AIC: 11353920
## 
## Number of Fisher Scoring iterations: 7

As for OLS, the parameter estimates for Dist are negative and those for PersonsO and PersonsD are positive. Values such as 2.11e-05 may not mean much to you. This is scientific notation and is a conveninet way of displaying very small or very large values. In the case of -2.11e-05, this means that the decimal place is moved 5 positions to the left, so -2.11e-05 = -0.0000211. The parameter estimate for PersonsO is larger than the value for PersonsD. This suggests that the number of persons at the origin has a larger positive effect on flow size than does the number of persons at the destination. This might suggest movement away from areas with larger populations. Have a think about whether than surprises you, given what you know about population change in the UK…

There are still problems with applying Poisson regression but it is conceptually an improvement on OLS regression.

We know that data on small flows are unreliable because of the small count adjustment procedure detailed previously. So, try re-running the Poisson regression model with different (small) flow thresholds (following the steps outlined for mapping flows). Use thresholds of 5 and 10 and compare results to those using all data (that is, a threshold of zero).

The various analysis that you have prepared during this tutorial will be of assistance when compiling your report. You may have to piece together different parts of the code to achieve this, however, examples of any code that you need to complete your report are contained in this document. However, you may wish to make use of the knowledge developed in previous practicals.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.