simR PackageThe simR package allows spatial interaction models (SIMs) to be fitted easily. In particular:
dplyr ‘pipeline’ approach.Firstly - load the package. Because of the ‘tidyverse’ focus of the package, load this as well:
library(simR)
library(tidyverse)This package essentially works with geographical data. Although one could use SIMs without drawing any maps, the author feels this is essentially lacking. For that reason, it is also worth loading the sf and tmap packages1.
library(sf)
#> Warning: package 'sf' was built under R version 3.4.4
library(tmap)Sample data for the package relates the Local Authority areas in the East Midlands of the UK. Here is the simplefeatures table showing the areas:
data(eastmid_sf)
tm_shape(eastmid_sf) + tm_borders()There is also travel-to-work data - basically these are counts of the number of people travelling from one Local Authority area to another (or to itself). This is organised so that each row corresponds to a single origin-destination pair:
data(eastmid_ttw)
eastmid_ttw
#> # A tibble: 1,600 x 4
#> From To Count Dist
#> <chr> <chr> <int> <dbl>
#> 1 Amber Valley Amber Valley 23506 0
#> 2 Ashfield Amber Valley 2568 15.6
#> 3 Bassetlaw Amber Valley 96 47.9
#> 4 Blaby Amber Valley 20 53.5
#> 5 Bolsover Amber Valley 2513 24.3
#> 6 Boston Amber Valley 3 97.1
#> 7 Broxtowe Amber Valley 1733 14.9
#> 8 Charnwood Amber Valley 96 39.0
#> 9 Chesterfield Amber Valley 820 25.6
#> 10 Corby Amber Valley 0 78.6
#> # ... with 1,590 more rowsThe columns are:
From : The name of the origin Local Authority area.To : The name of the destination Local Authority area.Count : The number of people travelling to work for the origin/destination pair.Dist : The ‘crow flies’ diustancew between the origin and destination.Now fit a model:
eastmid_ttw %>% doubly_constrained_sim(From,To,Dist,Count)
#> Spatial Interaction Model
#> =========================
#>
#> Variables
#> ---------
#> orig = From
#> dest = To
#> cost = Dist
#> intr = Count
#> ---------------------
#>
#> Model Type: Doubly Constrained
#> Cost Denominator Parameter 8.34
#> Cost Half Life 5.78
#> ---------------------
#>
#> AIC 227127.6
#> ====================================At this stage, this might make little sense unless you are familiar withs SIMs. However, here are a few words of explanation. The doubly_constrained_sim function fits a doubly constrained SIM model - this predicts the interaction between origin and destination - here this is the number of people travelling to work from the origin to the destation, ensuring that the total number of people leaving the origin, and the total number arriving at the destination agree with the observed figures, taking into account a ‘cost deterrence’ effect, where more costly trips are less likely to happen, given these constraints.
The function works like dplyr type functions, so that the arguments can be specified as column names in a data frame, without having to quote them. It returns an object of class ‘sim’ (which inherits from the class ‘glm’).
The model is of the form
\[
\hat{T_{ij}} = k A_iB_j\exp\left(-\frac{ C_{ij}}{\gamma} \right)
\] Where \(A_i\) and \(B_j\) are ‘balancing terms’ to make the constraints apply and \(\gamma\) is a cost deterrence parameter. ‘cost’ here is used broadly, and can generally represent any measure of effort needed to travel from zone \(i\) to zone \(j\). \(\hat{T_{ij}}\) is the interaction between zones \(i\) and \(j\). The ‘hat’ on the \(\hat{T_{ij}}\) denotes that this is the modelled interaction - the actual interaction is \(T_{ij}\) (no hat). This is the Count column in the example data set eastmid_ttw. \(\gamma\) here corresponds to the ‘Cost Denominator Parameter’ in the printout. Since units here are in Km this is just over 8 Km. When the deterrence function is exponential, as in the model here, a ‘half life’ cost can be calculated. Assuming all other characteristics of an origin and destimation are fixed, for every whole number multiple of this cost the amount of interaction is halved. Here this is a distance - and has a value of a little under 6km.
To understand the idea more it is worth reading these two articles by Adam Dennett - https://rpubs.com/adam_dennett/259068 and https://rpubs.com/adam_dennett/257231 - in many ways these are what prompted me to create the simR package.
Note that this is a ‘toy’ data set - ideally one would use more geographical detail, and a more realistic cost distance - such as travel time. Also, the cost when the origin and the destination are the same is generally not zero! However, the computing procedures are the same. In the fullness of time I’ll replace these distances with something more convincing…
In the first section, a doubly constrained SIM model was fitted, but nothing really done with it. To do things with it, the object should be stored:
eastmid_ttw %>% doubly_constrained_sim(From,To,Dist,Count) -> sim_modelThen it is possible to add the fitted \(\hat{T_{ij}}\) values to the eastmids_ttw data frame.
eastmid_ttw %>% add_sim_fitted(sim_model) -> eastmid_ttw_fitted
eastmid_ttw_fitted
#> # A tibble: 1,600 x 5
#> From To Count Dist Fitted
#> <chr> <chr> <int> <dbl> <dbl>
#> 1 Amber Valley Amber Valley 23506 0 19272
#> 2 Ashfield Amber Valley 2568 15.6 2803
#> 3 Bassetlaw Amber Valley 96 47.9 77.8
#> 4 Blaby Amber Valley 20 53.5 17.3
#> 5 Bolsover Amber Valley 2513 24.3 812
#> 6 Boston Amber Valley 3 97.1 0.179
#> 7 Broxtowe Amber Valley 1733 14.9 2073
#> 8 Charnwood Amber Valley 96 39.0 176
#> 9 Chesterfield Amber Valley 820 25.6 931
#> 10 Corby Amber Valley 0 78.6 0.835
#> # ... with 1,590 more rowsWe can now check that the calibration really is constrained to make the total sums going in to the destinations in the fitted values agree with those in the observed counts:
eastmid_ttw_fitted %>% group_by(To) %>%
summarise(Count=sum(Count),Fitted=sum(Fitted))
#> # A tibble: 40 x 3
#> To Count Fitted
#> <chr> <int> <dbl>
#> 1 Amber Valley 43359 43359
#> 2 Ashfield 43156 43156
#> 3 Bassetlaw 35376 35376
#> 4 Blaby 40739 40739
#> 5 Bolsover 21819 21819
#> 6 Boston 25053 25053
#> 7 Broxtowe 29693 29693
#> 8 Charnwood 52943 52943
#> 9 Chesterfield 39477 39477
#> 10 Corby 26052 26052
#> # ... with 30 more rowsIt seems to! The same could be done for the origins.
Finally, make a map. Here we’ll filter out the fitted trips to Derby and map them, and compare them to the observed trips.
eastmid_ttw_fitted %>% filter(To=='Derby') -> to_derby
to_derby
#> # A tibble: 40 x 5
#> From To Count Dist Fitted
#> <chr> <chr> <int> <dbl> <dbl>
#> 1 Amber Valley Derby 8364 13.0 8369
#> 2 Ashfield Derby 767 24.4 1997
#> 3 Bassetlaw Derby 76 57.9 48.3
#> 4 Blaby Derby 119 41.4 152
#> 5 Bolsover Derby 518 36.1 408
#> 6 Boston Derby 18 97.4 0.356
#> 7 Broxtowe Derby 1723 15.6 3954
#> 8 Charnwood Derby 624 29.2 1177
#> 9 Chesterfield Derby 391 38.5 410
#> 10 Corby Derby 4 69.6 5.10
#> # ... with 30 more rowsTo map this, we need to join the table to_derby to the eastmid_sf simple feature table.
eastmid_sf %>% left_join(to_derby,by=c("name"="From")) -> to_derby_sfSo now mapping is possible…
legend_style <- tm_legend(legend.position=c('right','bottom'),
legend.text.size=0.4)
tm_shape(to_derby_sf) + tm_polygons(col='Fitted',style='fisher') + legend_styletm_shape(to_derby_sf) + tm_polygons(col='Count',style='fisher') + legend_styleInstall these packages if you haven’t done already!↩