Using the simR Package

Chris Brunsdon

2018-04-10

The simR package allows spatial interaction models (SIMs) to be fitted easily. In particular:

Getting started

Firstly - load the package. Because of the ‘tidyverse’ focus of the package, load this as well:

library(simR)
library(tidyverse)

This package essentially works with geographical data. Although one could use SIMs without drawing any maps, the author feels this is essentially lacking. For that reason, it is also worth loading the sf and tmap packages1.

library(sf)
#> Warning: package 'sf' was built under R version 3.4.4
library(tmap)

Sample data for the package relates the Local Authority areas in the East Midlands of the UK. Here is the simplefeatures table showing the areas:

data(eastmid_sf)
tm_shape(eastmid_sf) + tm_borders()

There is also travel-to-work data - basically these are counts of the number of people travelling from one Local Authority area to another (or to itself). This is organised so that each row corresponds to a single origin-destination pair:

data(eastmid_ttw)
eastmid_ttw
#> # A tibble: 1,600 x 4
#>    From         To           Count  Dist
#>    <chr>        <chr>        <int> <dbl>
#>  1 Amber Valley Amber Valley 23506   0  
#>  2 Ashfield     Amber Valley  2568  15.6
#>  3 Bassetlaw    Amber Valley    96  47.9
#>  4 Blaby        Amber Valley    20  53.5
#>  5 Bolsover     Amber Valley  2513  24.3
#>  6 Boston       Amber Valley     3  97.1
#>  7 Broxtowe     Amber Valley  1733  14.9
#>  8 Charnwood    Amber Valley    96  39.0
#>  9 Chesterfield Amber Valley   820  25.6
#> 10 Corby        Amber Valley     0  78.6
#> # ... with 1,590 more rows

The columns are:

Now fit a model:

eastmid_ttw %>% doubly_constrained_sim(From,To,Dist,Count)
#> Spatial Interaction Model
#> =========================
#> 
#> Variables
#> ---------
#> orig =  From 
#> dest =  To 
#> cost =  Dist 
#> intr =  Count 
#> ---------------------
#> 
#> Model Type: Doubly Constrained 
#> Cost Denominator Parameter   8.34 
#> Cost Half Life               5.78 
#> ---------------------
#> 
#> AIC                        227127.6 
#> ====================================

What’s Going On?

At this stage, this might make little sense unless you are familiar withs SIMs. However, here are a few words of explanation. The doubly_constrained_sim function fits a doubly constrained SIM model - this predicts the interaction between origin and destination - here this is the number of people travelling to work from the origin to the destation, ensuring that the total number of people leaving the origin, and the total number arriving at the destination agree with the observed figures, taking into account a ‘cost deterrence’ effect, where more costly trips are less likely to happen, given these constraints.

The function works like dplyr type functions, so that the arguments can be specified as column names in a data frame, without having to quote them. It returns an object of class ‘sim’ (which inherits from the class ‘glm’).

The model is of the form

\[ \hat{T_{ij}} = k A_iB_j\exp\left(-\frac{ C_{ij}}{\gamma} \right) \] Where \(A_i\) and \(B_j\) are ‘balancing terms’ to make the constraints apply and \(\gamma\) is a cost deterrence parameter. ‘cost’ here is used broadly, and can generally represent any measure of effort needed to travel from zone \(i\) to zone \(j\). \(\hat{T_{ij}}\) is the interaction between zones \(i\) and \(j\). The ‘hat’ on the \(\hat{T_{ij}}\) denotes that this is the modelled interaction - the actual interaction is \(T_{ij}\) (no hat). This is the Count column in the example data set eastmid_ttw. \(\gamma\) here corresponds to the ‘Cost Denominator Parameter’ in the printout. Since units here are in Km this is just over 8 Km. When the deterrence function is exponential, as in the model here, a ‘half life’ cost can be calculated. Assuming all other characteristics of an origin and destimation are fixed, for every whole number multiple of this cost the amount of interaction is halved. Here this is a distance - and has a value of a little under 6km.

To understand the idea more it is worth reading these two articles by Adam Dennett - https://rpubs.com/adam_dennett/259068 and https://rpubs.com/adam_dennett/257231 - in many ways these are what prompted me to create the simR package.

Note that this is a ‘toy’ data set - ideally one would use more geographical detail, and a more realistic cost distance - such as travel time. Also, the cost when the origin and the destination are the same is generally not zero! However, the computing procedures are the same. In the fullness of time I’ll replace these distances with something more convincing…

Using the SIM Model Object

In the first section, a doubly constrained SIM model was fitted, but nothing really done with it. To do things with it, the object should be stored:

eastmid_ttw %>% doubly_constrained_sim(From,To,Dist,Count) -> sim_model

Then it is possible to add the fitted \(\hat{T_{ij}}\) values to the eastmids_ttw data frame.

eastmid_ttw %>% add_sim_fitted(sim_model) -> eastmid_ttw_fitted
eastmid_ttw_fitted
#> # A tibble: 1,600 x 5
#>    From         To           Count  Dist    Fitted
#>    <chr>        <chr>        <int> <dbl>     <dbl>
#>  1 Amber Valley Amber Valley 23506   0   19272    
#>  2 Ashfield     Amber Valley  2568  15.6  2803    
#>  3 Bassetlaw    Amber Valley    96  47.9    77.8  
#>  4 Blaby        Amber Valley    20  53.5    17.3  
#>  5 Bolsover     Amber Valley  2513  24.3   812    
#>  6 Boston       Amber Valley     3  97.1     0.179
#>  7 Broxtowe     Amber Valley  1733  14.9  2073    
#>  8 Charnwood    Amber Valley    96  39.0   176    
#>  9 Chesterfield Amber Valley   820  25.6   931    
#> 10 Corby        Amber Valley     0  78.6     0.835
#> # ... with 1,590 more rows

We can now check that the calibration really is constrained to make the total sums going in to the destinations in the fitted values agree with those in the observed counts:

eastmid_ttw_fitted %>% group_by(To) %>% 
  summarise(Count=sum(Count),Fitted=sum(Fitted))
#> # A tibble: 40 x 3
#>    To           Count Fitted
#>    <chr>        <int>  <dbl>
#>  1 Amber Valley 43359  43359
#>  2 Ashfield     43156  43156
#>  3 Bassetlaw    35376  35376
#>  4 Blaby        40739  40739
#>  5 Bolsover     21819  21819
#>  6 Boston       25053  25053
#>  7 Broxtowe     29693  29693
#>  8 Charnwood    52943  52943
#>  9 Chesterfield 39477  39477
#> 10 Corby        26052  26052
#> # ... with 30 more rows

It seems to! The same could be done for the origins.

Finally, make a map. Here we’ll filter out the fitted trips to Derby and map them, and compare them to the observed trips.

eastmid_ttw_fitted %>% filter(To=='Derby') -> to_derby
to_derby
#> # A tibble: 40 x 5
#>    From         To    Count  Dist   Fitted
#>    <chr>        <chr> <int> <dbl>    <dbl>
#>  1 Amber Valley Derby  8364  13.0 8369    
#>  2 Ashfield     Derby   767  24.4 1997    
#>  3 Bassetlaw    Derby    76  57.9   48.3  
#>  4 Blaby        Derby   119  41.4  152    
#>  5 Bolsover     Derby   518  36.1  408    
#>  6 Boston       Derby    18  97.4    0.356
#>  7 Broxtowe     Derby  1723  15.6 3954    
#>  8 Charnwood    Derby   624  29.2 1177    
#>  9 Chesterfield Derby   391  38.5  410    
#> 10 Corby        Derby     4  69.6    5.10 
#> # ... with 30 more rows

To map this, we need to join the table to_derby to the eastmid_sf simple feature table.

eastmid_sf %>% left_join(to_derby,by=c("name"="From")) -> to_derby_sf

So now mapping is possible…

legend_style <- tm_legend(legend.position=c('right','bottom'),
                          legend.text.size=0.4)
tm_shape(to_derby_sf) + tm_polygons(col='Fitted',style='fisher') + legend_style

tm_shape(to_derby_sf) + tm_polygons(col='Count',style='fisher') + legend_style


  1. Install these packages if you haven’t done already!