A brief explanation on the methodology behind Modal Spatial Interaction Model(s)

Author

Eric Wanjau

Code

# Setup
suppressWarnings(if(!require("pacman")) install.packages("pacman"))

pacman::p_load('tidyverse', 'here', 'sf', 'tmap', 'broom', 'paletteer', 'tidymodels', "osmdata", "statip", "patchwork", "furrr", "stplanr", "poissonreg")

# Set up parallel processing
#plan(multisession, workers = 2)

Note

This is an exploratory notebook of modal SIMs.

The Gravity Distribution Model

Spatial interaction is a broad term encompassing the flow or movement over space which results from a decision process (O’Kelly 2009).

Distribution models are models that estimate the number of trips that occur between each origin zone and each destination zone. Distribution models start from assumptions about group trip making behaviour and the way this is influenced by external factors such as total trip ends and travel costs. The best known of these models is the gravity model. Gravity models are the most widely used types of interaction models Haynes and Fotheringham (2020), and were originally generated from an analogy with Newton’s gravitational law. These estimate trips for each cell in the matrix without directly using the observed trip pattern; therefore they are sometimes called synthetic as opposed to growth factor models Ortúzar and Willumsen (2011).

The first use of a gravity model within a framework of urban mobility was made by Casey Jr (1955), who applied this approach to simulate shopping trips made between the regional towns. The initial formulation of the model was as follows Ortúzar and Willumsen (2011):

\[ T_{ij} = \frac{\alpha P_{i}P_{j}}{d_{ij}^2} \tag{1}\]

where

\(T_{ij}\) are the journeys between origin \(i\) and destination \(j\)

\(P_{i}\) and \(P_{j}\) are the populations of the origin and destination cities for the trips

\(d_{ij}\) is the distance between both zones

\(\alpha\) is a proportionality factor, which allows the units of population to be transformed into trips

This formulation was later thought to be an analogy with the law of gravity too simple and some fundamental modifications were made to the basic model in Equation 1. The overall generated and attracted trips (\(O_i\) and \(D_j\)) were used instead of the populations of the zones and the effect of distance (\(d_{ij}\)) was modelled more efficiently using a decreasing function according to the distance or cost. After making these corrections, the resulting model was as follows:

\[ T_{ij} = \alpha O_iD_jf(c_{ij}) \tag{2}\]

where \(f(c_{ij})\) is the generalized function of journey cost, which represents the disincentive to travel as distance or cost increases.

If the trips are represented in a matrix as below, then in the context of the gravity models different types of models can be defined according to the total outflow or inflow totals. This produces a “family” of spatial interaction models (A. G. Wilson 1971): an origin-constrained model, a destination-constrained model, and a doubly constrained model.

From the trip matrix, two of the possible constraints to consider are:

\[ \sum_j T_{ij} = O_i \tag{3}\]

\[ \sum_i T_{ij} = D_j \tag{4}\]

These constraints show us that the sum of the trips on each of the matrix rows should be equal to the total number of trips generated (\(O_i\)) by the origin zone that the row refers to; analogously, the sum of the trips for each matrix column should correspond to the number of trips attracted (\(D_j\)) by the destination zone the column refers to. Ensuring that both constraints are met requires one to replace the proportionality factor \(\alpha\) by the two balancing factors \(A_i\) and \(B_j\). Making these substitutions yields the following expression:

\[ T_{ij} = A_i O_i B_j D_jf(c_{ij}) \tag{5}\]

This represents the classic version of the doubly constrained gravity model. The singly constrained versions of the model, both in origins and in destinations, can be derived by making one set of balancing factors \(A_i\) or \(B_i\) equal to 1. For instance, in an origin-constrained model, \(B_j = 1\) for all \(j\) and

\[ A_i = \frac{1}{\sum_j D_j f(c_{ij})} \]

\(A_i\) imposes the constraint given in Equation 3, that is, the predicted total interaction volume leaving each origin should equal the known value, \(O_i\). The origin constrained model thus adopts the form:

\[ T_{ij} = A_i O_i D_jf(c_{ij}) \tag{6}\]

The same reasoning and mathematical treatment can be translated to the destination constrained models yielding:

\[ T_{ij} = O_i B_j D_jf(c_{ij}) \tag{7}\]

with

\[ B_j = \frac{1}{\sum_i O_i f(c_{ij})} \]

In the case of a doubly constrained model, the balancing factors are

\[ A_i = \frac{1}{\sum_j D_j f(c_{ij})} \tag{8}\]

and

\[ B_j = \frac{1}{\sum_i A_j f(c_{ij})} \tag{9}\]

These balancing factors are interdependent, which is why in the case of the doubly constrained model the values of both groups are required. In practice, \(A_i\) and \(B_j\) are estimated by performing an iterative process initiated by making all \(B_j = 1\), then by considering \(f(c_{ij})\) and calculating \(A_i\). Following that, the \(B_j\) are once again calculated with these values and the steps are repeated until the process converges at a solution or it gets close enough to a defined stop criterion, which fits the observed data sufficiently well to maintain the trustworthiness of the model (Cordera et al. 2017). This will be illustrated in Section 0.0.4.

0.0.1 Interaction models considering transport mode.

Among the doubly constrained models, (A. Wilson 1970) proposed considering a combined form of trip distribution and modal choice. This Notebook explores Wilson’s concept to simulate the number of journeys made from origin zones \(i\) to the destination zones \(j\) by the transport mode \(m\) using the following 3 equations, see Equation 10

\[ T_{ij}^m = O_i^m D_j^m {d_{ij}^m}^{-\beta_m} A_i^m B_j^m \tag{10}\]

\[ T_{ij}^m = O_i D_j {d_{ij}m}{-\beta_m} A_i B_j\ \]

\[ T_{ij}^m = A_i^m O_i^m B_j D_j {d_{ij}m}{-\beta_m} \]

where

\(T_{ij}^m\) is the number of trips from origin \(i\) to destination \(j\) using mode \(m\)

\(O_i^m\) is the total number of trips generated at origin zone \(i\) by mode \(m\)

\(O_i\) is the total number of trips generated at origin zone \(i\)

\(D_j^m\) is the total number of trips attracted at destination \(j\) by mode \(m\)

\(D_j\) is the total number of trips attracted at destination \(j\)

\(d_{ij}^m\) is the cost (distance) of making a trip from origin \(i\) to destination \(j\) using mode \(m\)

First Equation of the SIMs

Let’s begin with the first equation. The above model can be re-specified as a Poisson regression as shown in Equation 11. This allows us to obtain among other things the coefficient \(\beta_m\) which estimates the effect of travel cost for each mode of transport

\[ T_{ij}^m = A_i^m O_i^m B_j^m D_j^m {d_{ij}^m}^{-\beta_m} \\ \lambda_{ij}^m = exp(\mu_i^m + \alpha_{j}^m - \beta^m \ln d_{ij}^m) \tag{11}\]

\(\mu_i\) and \(\alpha_j\) are the equivalent of the vectors of balancing factors \(A_i\) and \(B_j\), but in regression / log-linear modelling terminology can also be described as either dummy variables or fixed effects. They allow us to to account for the geographical effects of an area; in this case, the total flows emanating from and destining to a given commune.

0.0.2 Data preparation

In this section, we get our data into a suitable form for modeling i.e aggregating flows into \(OD\), \(O_i^m\), \(D_j^m\) etc

Code

# Read in survey data
wtdf_trips_flows <- readRDS("wtdf_trips_flows") %>% 
  st_as_sf(crs = 4326)

df_trips_flows <- wtdf_trips_flows %>% 
  st_drop_geometry()

# Generate flows depending on origin, destination and vehicle characteristics
tmode_flows <- df_trips_flows %>% 
  group_by(orig_code, dest_code, OD, vehic,
           
           # n_hsps, n_unics, n_retails, n_malls,
           # n_smarkets, n_restaurants, n_entmts,
           # n_commercials, building_density, dest_pop,
           # nodes
           ) %>% 
  summarize(flows = n(), OD_dist = mean(OD_dist), network_dist = mean(network_dist)) %>% 
  ungroup() %>% 
  mutate(orig_code = as.character(orig_code)) %>%  
  filter(vehic != "tram")


# Function that generate flow tibbles for each mode
generate_modal_flows <- function(tbl, vehicle, threshold){
  # Filter for desired vehicle and threshold
  tbl = tbl %>% 
    filter(vehic == vehicle) %>% 
    filter(flows > threshold) %>% 
    select(OD, orig_code, dest_code, vehic,
           network_dist, flows)
  
  # Add Oi and Dj
  tbl <- tbl %>% 
  group_by(orig_code) %>% 
  mutate(Oi = sum(flows), .after = orig_code) %>% 
  ungroup() %>% 
  group_by(dest_code) %>% 
  mutate(Dj = sum(flows), .after = dest_code) %>% 
  ungroup()
  
  return(tbl)

  
}



# Narrow down to individual vehicle flows
model_tflows_moto <- generate_modal_flows(tbl = tmode_flows, vehicle = "moto", threshold = 3)

model_tflows_car <- generate_modal_flows(tbl = tmode_flows, vehicle = "car", threshold = 2)

model_tflows_bus <- generate_modal_flows(tbl = tmode_flows, vehicle = "bus", threshold = 2)

model_tflows_bike <- generate_modal_flows(tbl = tmode_flows, vehicle = "bike", threshold = 2)

model_tflows_ebike <- generate_modal_flows(tbl = tmode_flows, vehicle = "ebike", threshold = 2)

model_tflows_taxi <- generate_modal_flows(tbl = tmode_flows, vehicle = "taxi", threshold = 2)

model_tflows_walk <- generate_modal_flows(tbl = tmode_flows, vehicle = "walk", threshold = 2)


# Display top 5 OD trips for each vehicle
model_tflows_moto %>% 
  slice_max(n = 5, flows)

model_tflows_car %>% 
  slice_max(n = 5, flows)

model_tflows_bus %>% 
  slice_max(n = 5, flows)

model_tflows_bike %>% 
  slice_max(n = 5, flows)

model_tflows_ebike %>% 
  slice_max(n = 5, flows)

model_tflows_taxi %>% 
  slice_max(n = 5, flows)

model_tflows_walk %>% 
  slice_max(n = 5, flows)

# A tibble: 5 x 8
  OD        orig_code    Oi dest_code    Dj vehic network_dist flows
  <chr>     <chr>     <int> <chr>     <int> <chr>        <dbl> <int>
1 3254-3050 3254        432 3050        309 moto        15.4     159
2 3254-3048 3254        432 3048        323 moto        15.0     137
3 3254-2856 3254        432 2856        449 moto        13.0     136
4 3248-2856 3248        172 2856        449 moto         0.138   114
5 3060-3043 3060        158 3043        249 moto         9.79    108

# A tibble: 5 x 8
  OD        orig_code    Oi dest_code    Dj vehic network_dist flows
  <chr>     <chr>     <int> <chr>     <int> <chr>        <dbl> <int>
1 3020-3008 3020        332 3008        160 car          16.7    124
2 2929-2882 2929        313 2882        204 car           5.09   104
3 2993-3015 2993        176 3015        248 car          12.2     96
4 3055-3043 3055        110 3043        284 car           7.78    95
5 2927-3024 2927        159 3024        106 car          10.4     92

# A tibble: 7 x 8
  OD        orig_code    Oi dest_code    Dj vehic network_dist flows
  <chr>     <chr>     <int> <chr>     <int> <chr>        <dbl> <int>
1 2877-2877 2877          9 2877         17 bus          0.585     9
2 3058-2999 3058          9 2999         10 bus          5.84      6
3 3251-2924 3251          6 2924         13 bus          4.19      6
4 3258-2992 3258         19 2992          9 bus          1.83      6
5 2927-2877 2927          5 2877         17 bus          3.59      5
6 3250-2923 3250         11 2923          5 bus          3.27      5
7 3252-3252 3252          5 3252          5 bus          0.878     5

# A tibble: 5 x 8
  OD        orig_code    Oi dest_code    Dj vehic network_dist flows
  <chr>     <chr>     <int> <chr>     <int> <chr>        <dbl> <int>
1 3251-3251 3251         55 3251        136 bike          3.07    55
2 3250-3251 3250        126 3251        136 bike          4.32    44
3 2923-3258 2923         42 3258         43 bike          6.50    39
4 2927-2923 2927         37 2923         56 bike          2.02    37
5 2883-3335 2883         48 3335         95 bike          4.29    32

# A tibble: 7 x 8
  OD        orig_code    Oi dest_code    Dj vehic network_dist flows
  <chr>     <chr>     <int> <chr>     <int> <chr>        <dbl> <int>
1 2889-3335 2889         82 3335         75 ebike         5.15    75
2 2853-2877 2853         94 2877         68 ebike         3.89    30
3 3258-2880 3258         34 2880         88 ebike         6.48    30
4 2853-2880 2853         94 2880         88 ebike         3.41    29
5 2853-3052 2853         94 3052         37 ebike         6.03    28
6 2855-2884 2855        117 2884         70 ebike         5.20    28
7 2992-2879 2992         48 2879         37 ebike         7.05    28

# A tibble: 6 x 8
  OD        orig_code    Oi dest_code    Dj vehic network_dist flows
  <chr>     <chr>     <int> <chr>     <int> <chr>        <dbl> <int>
1 3020-3054 3020         19 3054         19 taxi       15.2       19
2 2877-2877 2877         16 2877         13 taxi        0.441     13
3 2922-2922 2922         13 2922         10 taxi        0.183     10
4 2857-2855 2857         13 2855         13 taxi        1.33       7
5 2855-2855 2855         11 2855         13 taxi        0.0600     6
6 2857-2857 2857         13 2857         11 taxi        0.281      6

# A tibble: 5 x 8
  OD        orig_code    Oi dest_code    Dj vehic network_dist flows
  <chr>     <chr>     <int> <chr>     <int> <chr>        <dbl> <int>
1 2923-2923 2923         43 2923         33 walk         0.230    33
2 2924-2924 2924         33 2924         50 walk         0.661    33
3 3258-3258 3258         31 3258         22 walk         0.457    22
4 2846-2846 2846         19 2846         22 walk         0.666    19
5 3252-3252 3252         22 3252         21 walk         1.07     18

0.0.3 Model flows

In this section, we model flows as in Equation 11 and obtain \(\beta\) for each mode:

Code

set.seed(2056)
library(tidymodels)
library(poissonreg)
metrics <- metric_set(rmse, rsq)

# Create a Poisson Model specification
ps_reg <- poisson_reg() %>% 
  set_engine("glm") %>% 
  set_mode("regression")

# Create a recipe
ps_reg_rec <- recipe(flows ~ orig_code + 
                       dest_code + network_dist,
                     data = model_tflows_moto) %>% 
  # Log cost
  step_log(network_dist) %>% 
  step_dummy(all_nominal_predictors())

# Combine model specification and recipe into a workflow
ps_reg_wf <- workflow() %>% 
  add_recipe(ps_reg_rec) %>% 
  add_model(ps_reg)


# Functions that fits a model to the data and binds flow predictions
fit_predict <- function(tbl){
  # Fit model
  mdl = ps_reg_wf %>% 
    fit(tbl)
  
  # Make predictions
  tbl <- mdl %>% 
    augment(tbl) %>% 
    mutate(.pred = round(.pred)) %>% 
    relocate(.pred, .after = flows) %>% 
    mutate(beta = mdl %>% 
             tidy() %>% 
             filter(str_detect(term, "dist")) %>% 
             pull(estimate))
  
  return(tbl)
  
}


# Fit model to motobike data
model_tflows_moto <- fit_predict(model_tflows_moto)

# Fit model to car data
model_tflows_car <- fit_predict(model_tflows_car)

# Fit model to bike data
model_tflows_bike <- fit_predict(model_tflows_bike)

# Fit model to ebike data
model_tflows_ebike <- fit_predict(model_tflows_ebike)

# Fit model to taxi data
model_tflows_taxi <- fit_predict(model_tflows_taxi)

# Fit model to walk data
model_tflows_walk <- fit_predict(model_tflows_walk)

# Fit model to bus data
model_tflows_bus <- fit_predict(model_tflows_bus)

# Bind results into one tibble
tm_flows <- model_tflows_moto %>% 
  bind_rows(model_tflows_car) %>% 
  bind_rows(model_tflows_bus) %>% 
  bind_rows(model_tflows_bike) %>% 
  bind_rows(model_tflows_ebike) %>% 
  bind_rows(model_tflows_taxi) %>% 
  bind_rows(model_tflows_walk)

# Sample results
tm_flows %>% 
  group_by(vehic) %>% 
  slice_max(n = 1, .pred, with_ties = FALSE)

# A tibble: 7 x 10
# Groups:   vehic [7]
  OD        orig_code    Oi dest_code    Dj vehic network_dist flows .pred      beta
  <chr>     <chr>     <int> <chr>     <int> <chr>        <dbl> <int> <dbl>     <dbl>
1 3251-3251 3251         55 3251        136 bike         3.07     55    55  2.57e- 1
2 2877-2877 2877          9 2877         17 bus          0.585     9     9  1.58e-16
3 3020-3008 3020        332 3008        160 car         16.7     124   118 -9.63e- 2
4 2889-3335 2889         82 3335         75 ebike        5.15     75    75 -2.53e- 1
5 3254-2856 3254        432 2856        449 moto        13.0     136   157 -6.66e- 2
6 3020-3054 3020         19 3054         19 taxi        15.2      19    19 -6.06e- 3
7 2923-2923 2923         43 2923         33 walk         0.230    33    33 -1.98e+ 0

Code

# Model coefficiets
tm_flows %>% 
  distinct(vehic, beta)

# A tibble: 7 x 2
  vehic      beta
  <chr>     <dbl>
1 moto  -6.66e- 2
2 car   -9.63e- 2
3 bus    1.58e-16
4 bike   2.57e- 1
5 ebike -2.53e- 1
6 taxi  -6.06e- 3
7 walk  -1.98e+ 0

Code

# Overall performance metrics
metrics(tm_flows, truth = flows, estimate = .pred)

# A tibble: 2 x 3
  .metric .estimator .estimate
  <chr>   <chr>          <dbl>
1 rmse    standard       9.29 
2 rsq     standard       0.661

Our previous performance metrics were at:

rmse: 11.7798 and rsq: 0.3621 so a good improvement in reproducing the original trip matrices.

The bulk of the representational and policy relevance advantages of the gravity model lies in the deterrence function. Now that we have relatively usable deterrence factors \(\beta\) for each mode that sufficiently reproduce the observed trips, we can go ahead and illustrate how to calibrate the balancing factors \(A_i\) and \(B_j\):

0.0.4 Calibrating balancing factors

The balancing factors in Equation 8 and Equation 9 are interdependent, this means that the calculation of one set requires the values of another set. This suggests an iterative process similar to Furness (1965) which works well in practice.

Given a set of values for the deterrence function \(f(c_{ij})\):

set all \(B_j = 1.0\) and solve \(A_i\). In this context, solve for \(A_i\) means finding the correction factors that satisfy the trip generation constraints
with the latest \(A_i\) solve for \(B_j\), i.e correction factors that satisfy trip attraction constraints
keeping the \(B_j\)’s fixed, solve for \(A_i\) and repeat steps (2) and (3) until the changes are sufficiently small.

Code

# Function to calculate balancing factors

get_bf <- function(tbl, desired_convergence = 0.001, verbose = FALSE){
  
  # Determine type of deterrence function
  network_dist = ps_reg_rec$template %>% 
    dplyr::slice(1) %>% 
    pull(network_dist)
  
  prep_network_dist = ps_reg_rec %>% 
    prep() %>% juice() %>% dplyr::slice(1) %>%
    pull(network_dist)
  
  if(prep_network_dist == log(network_dist)){
    deterrence_function = "power_function"
  } else{
    deterrence_function = "exponential_function"
  }
  
  # Set starting values as per algorithm
  tbl <- tbl %>% 
    mutate(
      Ai = 1,
      Bj = 1,
      oldAi = 5,
      oldBj = 5,
      diff = abs((oldAi - Ai) / oldAi)
    )
  
  # Create convergence value and count
  convergence = 1
  count = 0
  
  while (convergence > desired_convergence) {
    
    # Increment count
    count = count + 1
    if(verbose){
      print(paste0("iteration ", count))
    }
    
    
    
    # Calibrate Ai
    if(deterrence_function == "power_function"){
    tbl = tbl %>% 
      mutate(Ai = Bj * Dj * (network_dist^beta)) %>%
      
      group_by(orig_code) %>% 
      mutate(Ai = 1/sum(Ai)) %>% 
      ungroup()
    } else {
    tbl = tbl %>% 
      mutate(Ai = Bj * Dj * exp(network_dist*beta)) %>%
      
      group_by(orig_code) %>% 
      mutate(Ai = 1/sum(Ai))%>% 
      ungroup()
    }
    
    # Now, if not the first iteration, calculate the
    # difference between  the new Ai values and the old
    # Ai values and once done, overwrite the old Ai
    # values with the new ones.
    
    if(count == 1){
    tbl <- tbl %>% 
      mutate(oldAi = Ai)
  } else {
    tbl <- tbl %>% 
      mutate(diff = abs((oldAi - Ai) / oldAi),
             oldAi = Ai)
  }
    
    # Optionally print out values
    cnvg_ai = tbl %>% 
      pull(diff) %>% sum()
    if(verbose){
      print(paste("Ai convergence", cnvg_ai))
    }
    
    # Now similar calibration for Bj
    
    
    if(deterrence_function == "power_function"){
    tbl = tbl %>% 
      mutate(Bj = Ai * Oi * (network_dist^beta)) %>%
      
      group_by(dest_code) %>% 
      mutate(Bj = 1/sum(Bj))%>% 
      ungroup()
    } else {
    tbl = tbl %>% 
      mutate(Bj = Ai * Oi * exp(network_dist*beta)) %>%
      
      group_by(dest_code) %>% 
      mutate(Bj = 1/sum(Bj))%>% 
      ungroup()
    }
    
    # Now, if not the first iteration, calculate the
    # difference between  the new Bj values and the old
    # Bj values and once done, overwrite the old Bj
    # values with the new ones.
    
    if(count == 1){
    tbl <- tbl %>% 
      mutate(oldBj = Bj)
  } else {
    tbl <- tbl %>% 
      mutate(diff = abs((oldBj - Bj) / oldBj),
             oldBj = Bj)
  }
    
    # Optionally print out values
    convergence = tbl %>% 
      pull(diff) %>% sum()
    if(verbose){
      print(paste("Bj convergence", convergence))
    }
    
  }
  tbl <- tbl %>% 
    select(-contains("old"), -diff)
  
  return(tbl)

  
  
}

Let’s verify whether the above works by

calibrating the balancing factors
plugging them into the multiplicative formula \(T_{ij} = A_i^m O_i^m B_j^m D_jf(c_{ij^m})\) and predicting flows
comparing predicted flows with what we had predicted earlier (.pred) using Poisson regression

Code

# Calibrate balancing factors for taxis
taxi_flows <- get_bf(tbl = model_tflows_taxi, verbose = TRUE)

[1] "iteration 1"
[1] "Ai convergence 13.6"
[1] "Bj convergence 13.6"
[1] "iteration 2"
[1] "Ai convergence 0.428231613651519"
[1] "Bj convergence 0.225746458938562"
[1] "iteration 3"
[1] "Ai convergence 0.130838680607008"
[1] "Bj convergence 0.0855563992693574"
[1] "iteration 4"
[1] "Ai convergence 0.0484652338935034"
[1] "Bj convergence 0.0338555658035617"
[1] "iteration 5"
[1] "Ai convergence 0.019016494717877"
[1] "Bj convergence 0.0136114956169525"
[1] "iteration 6"
[1] "Ai convergence 0.00762022863248326"
[1] "Bj convergence 0.00550659011246537"
[1] "iteration 7"
[1] "Ai convergence 0.00307871545230649"
[1] "Bj convergence 0.00223326995688755"
[1] "iteration 8"
[1] "Ai convergence 0.00124794585548403"
[1] "Bj convergence 0.000906643477889347"

Code

#Predict flows using multiplicative formulae
taxi_flows %>% 
  mutate(bf_flows = round(Ai * Oi * Bj * Dj * (network_dist^beta)),
         diff_flows = .pred - bf_flows) %>% 
  select(OD, Ai, Oi, Bj, Dj, network_dist, beta, bf_flows, .pred, diff_flows)

# A tibble: 17 x 10
   OD            Ai    Oi    Bj    Dj network_dist     beta bf_flows .pred
   <chr>      <dbl> <int> <dbl> <int>        <dbl>    <dbl>    <dbl> <dbl>
 1 2839-2839 0.255      3 0.643     6       0.0735 -0.00606        3     3
 2 2855-2855 0.0413    11 0.998    13       0.0600 -0.00606        6     6
 3 2855-2857 0.0413    11 1.00     11       1.33   -0.00606        5     5
 4 2857-2855 0.0416    13 0.998    13       1.33   -0.00606        7     7
 5 2857-2857 0.0416    13 1.00     11       0.281  -0.00606        6     6
 6 2877-2839 0.0488    16 0.643     6       1.53   -0.00606        3     3
 7 2877-2877 0.0488    16 1.27     13       0.441  -0.00606       13    13
 8 2922-2922 0.0764    13 0.997    10       0.183  -0.00606       10    10
 9 2922-2923 0.0764    13 1.01      3       1.48   -0.00606        3     3
10 2928-2928 0.331      3 1         3       0.382  -0.00606        3     3
11 3020-3054 0.0535    19 1        19      15.2    -0.00606       19    19
12 3118-3118 0.332      3 1         3       0.458  -0.00606        3     3
13 3245-3259 0.336      3 1         3       2.95   -0.00606        3     3
14 3250-3250 0.249      4 1         4       0.433  -0.00606        4     4
15 3291-3291 0.332      3 1         3       0.436  -0.00606        3     3
16 3302-3302 0.200      5 1         5       0.756  -0.00606        5     5
17 3335-3335 0.328      3 1         3       0.0726 -0.00606        3     3
# ... with 1 more variable: diff_flows <dbl>

Worked! The predicted flows are exactly the same.

0.0.5 Predicting flows in case of ban

As stated earlier, these type of models can estimate trips for each cell in the matrix without directly using the observed trip pattern, provided future values for parameters like \(O_i\) and \(D_j\) are available or estimable. So let’s see how we would predict flows in case of a motorbike ban.

We’ll use survey response to identify motorbike users alternative vehicles and how this would change the total number of modal flows generated/attracted at different communes.

Note

Train option as an alternative means will be converted to bus since the train flows were too few to be modelled.

Code

library(tidytext)
set.seed(2056)
# Identify OD communes that motorbike users were previously using
OD_moto <- model_tflows_moto %>% 
  distinct(OD) %>% 
  pull(OD)

# Since respondents provide alternative means, just randomly pick 1
scenario_moto <- df_trips_flows %>% 
  mutate(across(contains("_code"), as.character)) %>% 
  filter(vehic == "moto") %>% 
  filter(OD %in% OD_moto) %>% 
  unnest_tokens(output = scenario_vehic, input = alt_veh, token = "words", drop = F) %>% 
    relocate(scenario_vehic, .after = alt_veh) %>% 
  group_by(id) %>% 
  slice_sample(n = 1) %>% 
  ungroup() %>% 
  group_by(orig_code, dest_code, OD, scenario_vehic) %>% 
  summarize(flows = n(), network_dist = mean(network_dist)) %>% 
  ungroup() %>% 
  # Make train bus
  mutate(scenario_vehic = case_when(
    scenario_vehic == "lighttrain" ~ "bus",
    TRUE ~ scenario_vehic
  ))

# Display summary of what motorbike users result to
scenario_moto %>% 
  group_by(scenario_vehic) %>% 
  summarise(flows = sum(flows)) %>% 
  arrange(-flows)

# A tibble: 6 x 2
  scenario_vehic flows
  <chr>          <int>
1 bus             3016
2 car             2233
3 taxi            1308
4 ebike           1256
5 bike            1206
6 walk             849

The tibble above shows an example of a scenario and what motorbike respondents would result to. Using this info, let’s update our \(O_i^m\) s and \(D_j^m\)s and estimate flows.

Code

# Add alternative flows to existing ones
new_tm_flows <- scenario_moto %>% 
  rename(vehic = scenario_vehic) %>% 
  mutate(.pred = 0, .after = flows) %>% 
  bind_rows(
    tm_flows %>%
      filter(vehic != "moto") %>% 
      select(all_of(scenario_moto %>% 
  rename(vehic = scenario_vehic) %>% 
    mutate(.pred = 0, .after = flows) %>% 
    colnames()))
  ) %>% 
  group_by(orig_code, dest_code, OD, vehic) %>% 
  summarise(flows = sum(flows), 
            network_dist = mean(network_dist)) %>% 
  ungroup()

# Generate new Ois and Djs 
new_tm_flows <- new_tm_flows %>% 
  group_by(orig_code, vehic) %>% 
  mutate(Oi = sum(flows), .after = orig_code) %>%
  ungroup() %>% 
  group_by(dest_code, vehic) %>%
  mutate(Dj = sum(flows), .after = dest_code) %>%
  ungroup() %>% 
  left_join(tm_flows %>% distinct(vehic, beta))

new_tm_flows %>% 
  select(OD, Oi, Dj, vehic, network_dist, beta) %>% 
  slice_sample(n = 5)

# A tibble: 5 x 6
  OD           Oi    Dj vehic network_dist     beta
  <chr>     <int> <int> <chr>        <dbl>    <dbl>
1 3004-2855    14    57 ebike        3.13  -0.253  
2 2877-2857   134    79 car          3.44  -0.0963 
3 2889-2927     3    18 taxi         1.31  -0.00606
4 3340-3043    36   338 car          5.69  -0.0963 
5 2922-2922    42    32 taxi         0.140 -0.00606

Say the above were sample future estimates of \(O_i\)s and \(D_j^m\)s, flows can be estimated using Equation 11

Code

# Recalibrate new Ais and Djs to satisfy new trip constraints
# and make new flow estimates
# future_map runs models in parallel
new_tm_flows <- new_tm_flows %>% 
  split(.$vehic) %>% 
  future_map_dfr(~ get_bf(tbl = .x)) %>% 
  mutate(scenario_flows = round(Ai * Oi * Bj * Dj * (network_dist^beta))) %>% 
  select(OD, vehic, Ai, Oi, Bj, Dj, network_dist, beta, scenario_flows, observed_flows = flows)

new_tm_flows %>% 
  slice_sample(n = 7)

# A tibble: 7 x 10
  OD        vehic      Ai    Oi       Bj    Dj network_dist     beta scenario_flows
  <chr>     <chr>   <dbl> <int>    <dbl> <int>        <dbl>    <dbl>          <dbl>
1 2855-2843 bike  0.00708    39 1.03        22         1.29  0.257                7
2 3245-3249 taxi  0.0156     38 1.69         1         3.07 -0.00606              1
3 2926-2929 taxi  0.00660    21 0.500       55         4.18 -0.00606              4
4 2926-3259 car   0.00173    23 1.48        60         1.15 -0.0963               3
5 3006-2841 ebike 0.00200    39 1.70        11         3.06 -0.253                1
6 3251-3087 bike  0.00115   136 1.02        45        14.8   0.257               14
7 2839-2841 walk  0.0414     13 0.000953     8         1.23 -1.98                 0
# ... with 1 more variable: observed_flows <int>

The new usage of transport modes associated with this scenario is:

Code

# Observe changes in flows
tm_flows %>% 
  group_by(vehic) %>% 
  summarise(observed_flows = sum(flows)) %>% 
  left_join(new_tm_flows %>% 
  group_by(vehic) %>% 
  summarise(scenario_flows = sum(scenario_flows))) %>% 
  mutate(across(where(is.numeric), ~ coalesce(.x, 0)),
         diff_flows = scenario_flows - observed_flows) %>% 
  arrange(-diff_flows)

# A tibble: 7 x 4
  vehic observed_flows scenario_flows diff_flows
  <chr>          <dbl>          <dbl>      <dbl>
1 bus              110           3124       3014
2 car             5060           7291       2231
3 taxi              99           1404       1305
4 ebike            876           2134       1258
5 bike             870           2094       1224
6 walk             287           1127        840
7 moto            9868              0      -9868

0.0.6 Investigate changes associated with ebike flows

OD-lines

Code

tmap_mode("view")
# Read in shapefile data
# Load VN full data set
vn_orig <- st_read("data-synced/raw_data/VN.shp", quiet = T)

# Load commune data
hn <- st_read("data-synced/raw_data/Hanoi_commune.shp", quiet = T)

# Create a HN outline
hn_out <- hn %>% 
  st_union() 

# Put a 5KM buffer around it - probably account for islands in Hanoi?
hn_buf <- hn_out %>% 
  st_transform(3405) %>% 
  st_buffer(dist = 5000)

# Subset VN data
vn = vn_orig[st_transform(hn_buf, 4326), ]

# Assign area codes to column code
vn <- vn %>% 
  rownames_to_column(var = "code") 
 
# Previous OD lines
travel_network_ebike_observed <- tm_flows %>% 
  filter(vehic == "ebike") %>% 
  relocate(c(orig_code, dest_code)) %>% 
  od2line(vn)

p1 = tm_shape(st_jitter(travel_network_ebike_observed)) +
  tm_lines(lwd = "flows", scale = 7) +
  tm_shape(vn) +
  tm_borders(alpha = 0.1) +
  tm_basemap("OpenStreetMap")

# New OD lines
travel_network_ebike_scenario <- new_tm_flows %>% 
  filter(vehic == "ebike") %>% 
  filter(scenario_flows > 0) %>% 
  separate(OD, c("orig_code", "dest_code")) %>% 
  relocate(c(orig_code, dest_code)) %>% 
  od2line(vn)

p2 = tm_shape(st_jitter(travel_network_ebike_scenario)) +
  tm_lines(lwd = "scenario_flows", scale = 7) +
  tm_shape(vn) +
  tm_borders(alpha = 0.1) +
  tm_basemap("OpenStreetMap")

tmap_arrange(p1, p2, ncol = 2)

Change in flows at destinations associated with ebikes

Code

flows_change <- travel_network_ebike_observed %>% 
  as_tibble() %>% 
  select(dest_code, Dj) %>% 
  right_join(
    travel_network_ebike_scenario %>% 
      as_tibble() %>%
      select(dest_code, scenario_Dj = Dj)) %>%
    distinct(dest_code, Dj, scenario_Dj) %>% 
  mutate(across(where(is.numeric), ~ coalesce(.x, 0))) %>% 
  mutate(diff_flows = scenario_Dj - Dj) %>% 
  left_join(vn, by = c("dest_code" = "code")) %>% 
  st_as_sf()

tm_shape(flows_change) +
  tm_fill(col = "diff_flows", palette = c("#ECC463FF", "#026CCBFF", "#BBD961FF", "#007A33FF", "#9F248FFF", "#D50032FF"), alpha = 0.9) +
  tm_basemap("OpenStreetMap")

Observe changes in routes

Code

# Select route: Have to find a better way of doing it
weighted_dfm <- wtdf_trips_flows %>%
  group_by(OD) %>% 
  summarize(mode_dist = max(statip::mfv(network_dist)))


# Previous routes
tmap_options(bg.color = "grey10", basemaps.alpha = 0.5)
p1 = tm_shape(vn) +
  tm_fill(col = "grey10", alpha = 0.8) +
  tm_borders(col = "grey10", alpha = 0.7) +
  tm_basemap(server = "OpenStreetMap") +
  tm_shape(travel_network_ebike_observed %>% 
  st_drop_geometry() %>% 
  left_join(weighted_dfm, by = "OD") %>% 
  st_as_sf(crs = 4326) %>%
  filter(!st_is_empty(.))) +
  tm_lines(lwd = "flows", scale = 5, col = "#ccffff") 


# Scenario routes routes
p2 = tm_shape(vn) +
  tm_fill(col = "grey10", alpha = 0.8) +
  tm_borders(col = "grey10", alpha = 0.7) +
  tm_shape(travel_network_ebike_scenario %>%
  mutate(OD = paste(orig_code, dest_code, sep = "-")) %>% 
  st_drop_geometry() %>% 
  left_join(weighted_dfm, by = "OD") %>% 
  st_as_sf(crs = 4326) %>% filter(!st_is_empty(.)) ) +
  tm_lines(lwd = "scenario_flows", scale = 5, col = "#ccffff") +
    tm_basemap("OpenStreetMap")

tmap_arrange(p1, p2, ncol = 2)

0.1 Second Equation

Going back to the first equation in Equation 10, it allows us to analyze flows on a per modal basis independently of each other - which removes the entire concept of interaction. We expect that if we make changes in a particular mode, we would expect that change to affect the rest as well? It also requires careful consideration of future values of \(Oi^m\) and \(Dj^m\).

Let’s take a look at the second equation which is more aggregated than the first:

\[ T_{ij}^m = O_i D_j {d_{ij}^m}^{-\beta_m} A_i B_j \tag{12}\]

Note

Share a lot of similarities to the poster presentation at GISRUK 2022

In this section, we’ll model flows as in Equation 12

Code

# Aggregate Ois and Djs
agg_tmode_flows <- tm_flows %>% 
  group_by(orig_code) %>% 
  mutate(Oi = sum(flows)) %>% 
  ungroup() %>% 
  group_by(dest_code) %>% 
  mutate(Dj = sum(flows)) %>% 
  ungroup() %>% 
  rename(pred1 = .pred) %>% 
  rename(beta1 = beta)

# Model flows
agg_ps_mdl = glm(flows ~ orig_code + dest_code +
              ( log(network_dist):vehic ), family = poisson(link = "log"), data = agg_tmode_flows)





# Add predicted flows
agg_tmode_flows <- agg_tmode_flows %>% 
  mutate(.pred = round(fitted(agg_ps_mdl))) %>% 
  # Add beta
  left_join(agg_ps_mdl %>% 
  tidy() %>% 
  filter(str_detect(term, "vehic")) %>%
  mutate(vehic = str_split_fixed(term, pattern = "vehic",2)[,2]) %>% 
     select(vehic, beta = estimate))

# Performance metrics
metrics(agg_tmode_flows, truth = flows, estimate = .pred)

# A tibble: 2 x 3
  .metric .estimator .estimate
  <chr>   <chr>          <dbl>
1 rmse    standard      11.7  
2 rsq     standard       0.462

Quite similar to GISRUK’s presentation but higher rsq because this is a doubly constrained model and can explain more what is happening than the former origin constrained model.

Calibration of \(A_i\) and \(B_j\) values will be similar as before.

Code

# Calibrate Ais an Bjs
agg_tmode_flows <- get_bf(agg_tmode_flows)



#Predict flows using multiplicative formulae
agg_tmode_flows %>% 
  mutate(bf_flows = round(Ai * Oi * Bj * Dj * (network_dist^beta)),
         diff_flows = .pred - bf_flows) %>% 
  filter(vehic == "bus") %>% 
  select(OD, Ai, Oi, Bj, Dj, network_dist, beta, bf_flows, .pred, diff_flows)

# A tibble: 28 x 10
   OD             Ai    Oi    Bj    Dj network_dist   beta bf_flows .pred diff_flows
   <chr>       <dbl> <int> <dbl> <int>        <dbl>  <dbl>    <dbl> <dbl>      <dbl>
 1 2843-3339 4.27e-4    89 0.899   172        1.94  -0.397        5     5          0
 2 2845-2927 4.16e-4    81 0.479   259        2.74  -0.397        3     3          0
 3 2855-2857 1.58e-4   246 0.984   293        1.33  -0.397       10    10          0
 4 2857-2855 2.16e-4   123 0.914   308        1.33  -0.397        7     7          0
 5 2877-2877 8.08e-5   395 0.554   516        0.585 -0.397       11    11          0
 6 2883-2877 1.58e-4   204 0.554   516        1.89  -0.397        7     7          0
 7 2924-2922 1.98e-4   282 0.700   230        2.11  -0.397        7     7          0
 8 2924-2924 1.98e-4   282 0.439   363        1.17  -0.397        8     8          0
 9 2924-2929 1.98e-4   282 0.443   553        3.20  -0.397        9     9          0
10 2927-2877 1.94e-4   316 0.554   516        3.59  -0.397       11    11          0
# ... with 18 more rows

Again, calibration of \(A_i\) and \(B_i\) produces flows (bf_flows) that are similar to the predicted ones (.pred).

0.1.1 Investigating results of ban.

0.1.1.1 Increase in cost.

Equation 12 allows us to estimate the effects of modal cost on flows - which would be pretty useful if our cost parameter was monetary. Let’s explore how the effects of cost of a trip vary with the mode of transport

Code

dist_coef <- agg_ps_mdl %>% 
  tidy() %>% 
  filter(str_detect(term, "vehic")) %>%
  mutate(vehic = str_split_fixed(term, pattern = "vehic",2)[,2]) %>% 
     select(vehic, beta = estimate)

dist_coef %>% 
  mutate(exp_beta = exp(beta))

# A tibble: 7 x 3
  vehic    beta exp_beta
  <chr>   <dbl>    <dbl>
1 bike   0.213     1.24 
2 bus   -0.397     0.672
3 car    0.0851    1.09 
4 ebike  0.0799    1.08 
5 moto   0.0343    1.03 
6 taxi   0.0446    1.05 
7 walk   0.115     1.12

These coefficients allow us to estimate the impact of travel cost on flows depending on mode of transport and insights on how trips could potentially vary with different costs of making a trip. For instance, bus flows change by 0.6723852 (decrease of about 32%) for every increase of a standard deviation of bus costs. Let’s explore how this would affect the various modal flows by increasing bus cost by a factor 10:

Code

# Increase bus cost by a factor of 10
bc_tmode_flows <- agg_tmode_flows %>%  
  mutate(network_dist = case_when(
    vehic == "bus" ~ network_dist*10.5,
    TRUE ~ network_dist
  ))

# Calibrate Ai and Bj to match new costs
bc_tmode_flows <- get_bf(bc_tmode_flows) %>% 
  ungroup() %>% 
  mutate(scenario_flows = round(Ai * Oi * Bj * Dj * (network_dist^beta)))

# Compare change in flows:
bc_tmode_flows %>% 
  group_by(vehic) %>% 
  summarise(est_flows = sum(.pred)) %>% 
  left_join(
    bc_tmode_flows %>% 
      group_by(vehic) %>% 
      summarise(scenario_flows = sum(scenario_flows))) %>% 
  mutate(diff_flows = scenario_flows- est_flows) %>% 
  arrange(-diff_flows)

# A tibble: 7 x 4
  vehic est_flows scenario_flows diff_flows
  <chr>     <dbl>          <dbl>      <dbl>
1 moto       9309           9376         67
2 car        5172           5195         23
3 bike        885            903         18
4 walk        325            337         12
5 ebike      1108           1119         11
6 taxi        163            167          4
7 bus         212             94       -118

This shows the impact on the flows to buses as well as to other means of transport.

0.1.1.2 Change in modes of transport.

Another scenario that this model allows us to estimate is the impact of changing modes of transport. Since the data is already aggregated at the origins and destinations, we’ll have to design a scenario ourselves. Let’s explore what would be the effect of a hypothetical scenario motorbike users result to cars for distances greater than 10 KM.

Code

# Change alternative to cars for motorbike dist > 10
car_tmode_flows <- agg_tmode_flows %>%
  mutate(vehic_prev = vehic) %>% 
  mutate(vehic = case_when(vehic == "moto" & network_dist > 10
 ~ "car", TRUE ~ vehic)) %>% 
  rename(beta2 = beta) %>% left_join(agg_tmode_flows %>% distinct(vehic, beta))

# Adjust balancing factors
car_tmode_flows <-  get_bf(car_tmode_flows) %>% 
  mutate(alt_mode_flows = round(Ai * Oi * Bj * Dj * (network_dist^beta)))

car_tmode_flows %>% 
  group_by(vehic_prev) %>% 
  summarise(est_flows = sum(.pred)) %>% 
  rename(vehic = vehic_prev) %>% 
  left_join(
    car_tmode_flows %>% 
      group_by(vehic) %>% 
      summarise(scenario_flows = sum(alt_mode_flows))) %>% 
  mutate(diff_flows = scenario_flows - est_flows) %>% 
  arrange(-diff_flows)

# A tibble: 7 x 4
  vehic est_flows scenario_flows diff_flows
  <chr>     <dbl>          <dbl>      <dbl>
1 car        5172           6918       1746
2 bus         212            211         -1
3 walk        325            324         -1
4 taxi        163            161         -2
5 bike        885            876         -9
6 ebike      1108           1094        -14
7 moto       9309           7588      -1721

Third model

Another interesting model that I have explored is as below:

\[ T_{ij}^m = A_i^m O_i^m B_j D_j {d_{ij}^m}^{-\beta_m} \tag{13}\]

This equation shares similar aspects with those presented earlier. It allows us to capture an area’s emissivity of a given mode \(O_i^m\) while at the same time ensuring “interaction” by sharing attraction trip ends \(D_j\).

Let’s see how we would go about it.

Modeling flows according to Equation 13

Code

# Aggregate Dj
agg_tm_flows <-  tm_flows %>%
  group_by(dest_code) %>%
  mutate(Dj = sum(flows)) %>%
  ungroup()


# Calibrate for beta
dest_agg_ps_mdl = glm(flows ~ (orig_code:vehic) + dest_code + ( log(network_dist):vehic ), family = poisson(link = "log"), data = agg_tm_flows)

# Use values
agg_tm_flows <- agg_tm_flows %>% 
  mutate(.pred = round(fitted(dest_agg_ps_mdl))) %>% 
  rename(beta0 = beta) %>% 
  left_join(
    dest_agg_ps_mdl %>% 
      tidy() %>% 
      filter(str_detect(term, "dist")) %>% 
      mutate(term = str_split_fixed(term, ":", 2)[,1] %>% 
      str_remove("vehic")) %>% 
      select(vehic = term, beta = estimate))
  


# Metrics
metrics(agg_tm_flows, truth = flows, estimate = .pred)

# A tibble: 2 x 3
  .metric .estimator .estimate
  <chr>   <chr>          <dbl>
1 rmse    standard      10.2  
2 rsq     standard       0.589

Code

# cost coefficients 
agg_tm_flows %>% 
  distinct(vehic, beta)

# A tibble: 7 x 2
  vehic    beta
  <chr>   <dbl>
1 moto  -0.0583
2 car    0.0408
3 bus   -0.283 
4 bike   0.171 
5 ebike  0.0557
6 taxi  -0.244 
7 walk  -0.413

Calibration of \(A_i^m\) is different from the previous models since we consider each origin with respect to the flows of a particular mode as below:

Code

# Function to calculate balancing factors

get_bf_da <- function(tbl, desired_convergence = 0.001, verbose = FALSE){
  
  # Determine type of deterrence function
  network_dist = ps_reg_rec$template %>% 
    dplyr::slice(1) %>% 
    pull(network_dist)
  
  prep_network_dist = ps_reg_rec %>% 
    prep() %>% juice() %>% dplyr::slice(1) %>%
    pull(network_dist)
  
  if(prep_network_dist == log(network_dist)){
    deterrence_function = "power_function"
  } else{
    deterrence_function = "exponential_function"
  }
  
  # Set starting values as per algorithm
  tbl <- tbl %>% 
    mutate(
      Ai = 1,
      Bj = 1,
      oldAi = 5,
      oldBj = 5,
      diff = abs((oldAi - Ai) / oldAi)
    )
  
  # Create convergence value and count
  convergence = 1
  count = 0
  
  while (convergence > desired_convergence) {
    
    # Increment count
    count = count + 1
    if(verbose){
      print(paste0("iteration ", count))
    }
    
    
    
    # Calibrate Ai
    if(deterrence_function == "power_function"){
    tbl = tbl %>% 
      mutate(Ai = Bj * Dj * (network_dist^beta)) %>%
      
      group_by(orig_code, vehic) %>% 
      mutate(Ai = 1/sum(Ai)) %>% 
      ungroup()
    } else {
    tbl = tbl %>% 
      mutate(Ai = Bj * Dj * exp(network_dist*beta)) %>%
      
      group_by(orig_code, vehic) %>% 
      mutate(Ai = 1/sum(Ai))%>% 
      ungroup()
    }
    
    # Now, if not the first iteration, calculate the
    # difference between  the new Ai values and the old
    # Ai values and once done, overwrite the old Ai
    # values with the new ones.
    
    if(count == 1){
    tbl <- tbl %>% 
      mutate(oldAi = Ai)
  } else {
    tbl <- tbl %>% 
      mutate(diff = abs((oldAi - Ai) / oldAi),
             oldAi = Ai)
  }
    
    # Optionally print out values
    cnvg_ai = tbl %>% 
      pull(diff) %>% sum()
    if(verbose){
      print(paste("Ai convergence", cnvg_ai))
    }
    
    # Now similar calibration for Bj
    
    
    if(deterrence_function == "power_function"){
    tbl = tbl %>% 
      mutate(Bj = Ai * Oi * (network_dist^beta)) %>%
      
      group_by(dest_code) %>% 
      mutate(Bj = 1/sum(Bj))%>% 
      ungroup()
    } else {
    tbl = tbl %>% 
      mutate(Bj = Ai * Oi * exp(network_dist*beta)) %>%
      
      group_by(dest_code) %>% 
      mutate(Bj = 1/sum(Bj))%>% 
      ungroup()
    }
    
    # Now, if not the first iteration, calculate the
    # difference between  the new Bj values and the old
    # Bj values and once done, overwrite the old Bj
    # values with the new ones.
    
    if(count == 1){
    tbl <- tbl %>% 
      mutate(oldBj = Bj)
  } else {
    tbl <- tbl %>% 
      mutate(diff = abs((oldBj - Bj) / oldBj),
             oldBj = Bj)
  }
    
    # Optionally print out values
    convergence = tbl %>% 
      pull(diff) %>% sum()
    if(verbose){
      print(paste("Bj convergence", convergence))
    }
    
  }
  tbl <- tbl %>% 
    select(-contains("old"), -diff)
  
  return(tbl)

  
  
}



# Calibrate for Ai and Bi
agg_tm_flows <- get_bf_da(agg_tm_flows)


#Predict flows using multiplicative formulae
agg_tm_flows %>% 
  mutate(bf_flows = round(Ai * Oi * Bj * Dj * (network_dist^beta)),
         diff_flows = .pred - bf_flows) %>% 
  filter(vehic == "taxi") %>% 
  select(OD, Ai, Oi, Bj, Dj, network_dist, beta, bf_flows, .pred, diff_flows)

# A tibble: 17 x 10
   OD             Ai    Oi    Bj    Dj network_dist   beta bf_flows .pred diff_flows
   <chr>       <dbl> <int> <dbl> <int>        <dbl>  <dbl>    <dbl> <dbl>      <dbl>
 1 2839-2839 0.00266     3 0.661   301       0.0735 -0.244        3     3          0
 2 2855-2855 0.00128    11 0.823   308       0.0600 -0.244        7     7          0
 3 2855-2857 0.00128    11 1.02    293       1.33   -0.244        4     4          0
 4 2857-2855 0.00155    13 0.823   308       1.33   -0.244        5     5          0
 5 2857-2857 0.00155    13 1.02    293       0.281  -0.244        8     8          0
 6 2877-2839 0.00170    16 0.661   301       1.53   -0.244        5     5          0
 7 2877-2877 0.00170    16 0.647   516       0.441  -0.244       11    11          0
 8 2922-2922 0.00275    13 0.632   230       0.183  -0.244        8     8          0
 9 2922-2923 0.00275    13 0.354   446       1.48   -0.244        5     5          0
10 2928-2928 0.00643     3 0.421   292       0.382  -0.244        3     3          0
11 3020-3054 0.00549    19 1.61    220      15.2    -0.244       19    19          0
12 3118-3118 0.0376      3 1        22       0.458  -0.244        3     3          0
13 3245-3259 0.00659     3 2.30     86       2.95   -0.244        3     3          0
14 3250-3250 0.00658     4 0.541   229       0.433  -0.244        4     4          0
15 3291-3291 0.272       3 1         3       0.436  -0.244        3     3          0
16 3302-3302 0.117       5 1         8       0.756  -0.244        5     5          0
17 3335-3335 0.00255     3 0.596   347       0.0726 -0.244        3     3          0

0.2 Investigating ban scenarios

0.2.1 Increase in trip cost

This model allows one to explore the effect of costs on the number of trips, just like in Equation 12. Due to the nature of this model, i.e ensuring that \(\sum T_{ij}^m = O_i^m\) and \(\sum_m T_{ij} = D_j\), calibrating both parameters \(A_i\) and \(B_j\) results in the number of predicted trips as before. We therefore calibrate one parameter \(B_j\) since an increase in costs affects the number of trips arriving at a destination.

Say we increased motorbike costs by a factor 10.5, what would be the effect of this?

Code

# Increase cost of using motorbikes
mc_tm_flows <- agg_tm_flows %>%  
  mutate(network_dist = case_when(
    vehic == "moto" ~ network_dist*10.5,
    TRUE ~ network_dist
  ))

# Calibrate Bj since this will result in a change in flows
# At the destinations
mc_tm_flows <- mc_tm_flows %>%
  group_by(dest_code) %>%
  mutate(Bj = Ai * Oi * (network_dist^beta)) %>%
  mutate(Bj = 1/sum(Bj)) %>%
  # group_by(orig_code, vehic) %>%
  # mutate(Ai = Bj * Dj * (network_dist^beta)) %>%
  #     mutate(Ai = 1/sum(Ai))%>%
  #get_bf_da(mc_tm_flows) %>% 
  ungroup() %>% 
  mutate(scenario_flows = round(Ai * Oi * Bj * Dj * (network_dist^beta)))

# Compare change in flows:
mc_tm_flows %>% 
  group_by(vehic) %>% 
  summarise(est_flows = sum(.pred)) %>% 
  left_join(
    mc_tm_flows %>% 
      group_by(vehic) %>% 
      summarise(scenario_flows = sum(scenario_flows))) %>% 
  mutate(diff_flows = scenario_flows- est_flows) %>% 
  arrange(-diff_flows)

# A tibble: 7 x 4
  vehic est_flows scenario_flows diff_flows
  <chr>     <dbl>          <dbl>      <dbl>
1 car        5061           5362        301
2 bike        872            937         65
3 ebike       876            932         56
4 walk        287            304         17
5 taxi         99            104          5
6 bus         112            114          2
7 moto       9867           9420       -447

0.2.2 Change in transport mode

Since this model segregates the number of trips generated in each origin \(i\) by each mode \(m\), it can allow us to change the individual transport modes similar to Equation 11 and unlike Equation 12.

For instance in the case of a ban, we can change modify what motorbike users would result to, generate new \(O_i^m\)s and \(D_j\)s like in Section 0.0.5, and use this model to predict flows.

Note

Train option as an alternative means was randomly replaced with the other modes since the train flows were too few to be modelled.

Code

library(tidytext)
set.seed(2056)
# Identify OD communes that motorbike users were previously using
OD_moto <- model_tflows_moto %>% 
  distinct(OD) %>% 
  pull(OD)

# Since respondents provide alternative means, just randomly pick 1
scenario_moto <- df_trips_flows %>% 
  mutate(across(contains("_code"), as.character)) %>% 
  filter(vehic == "moto") %>% 
  filter(OD %in% OD_moto) %>% 
  unnest_tokens(output = scenario_vehic, input = alt_veh, token = "words", drop = F) %>% 
    relocate(scenario_vehic, .after = alt_veh) %>% 
  group_by(id) %>% 
  slice_sample(n = 1) %>% 
  ungroup() %>% 
  group_by(orig_code, dest_code, OD, scenario_vehic) %>% 
  summarize(flows = n(), network_dist = mean(network_dist)) %>% 
  ungroup() %>% 
  # Make train bus
  mutate(scenario_vehic = case_when(
    scenario_vehic == "lighttrain" ~ sample(c("bus", "bike", "ebike", "taxi", "car", "walk"), 1),
    TRUE ~ scenario_vehic
  ))

# Display summary of what motorbike users result to
scenario_moto %>% 
  group_by(scenario_vehic) %>% 
  summarise(flows = sum(flows)) %>% 
  arrange(-flows)

# A tibble: 6 x 2
  scenario_vehic flows
  <chr>          <int>
1 taxi            2548
2 car             2233
3 bus             1776
4 ebike           1256
5 bike            1206
6 walk             849

The table above shows what alternative vehicle motorbike users would result to.

Using this info, let’s update our \(O_i^m\) s and \(D_j^m\)s and estimate flows.

Code

# Add alternative flows to existing ones
new_agg_tm_flows <- scenario_moto %>% 
  rename(vehic = scenario_vehic) %>% 
  mutate(.pred = 0, .after = flows) %>% 
  bind_rows(
    agg_tm_flows %>%
      filter(vehic != "moto") %>% 
      select(all_of(scenario_moto %>% 
  rename(vehic = scenario_vehic) %>% 
    mutate(.pred = 0, .after = flows) %>% 
    colnames()))
  ) %>% 
  group_by(orig_code, dest_code, OD, vehic) %>% 
  summarise(flows = sum(flows), 
            network_dist = mean(network_dist)) %>% 
  ungroup()

# Generate new Ois and Djs 
new_agg_tm_flows <- new_agg_tm_flows %>% 
  group_by(orig_code, vehic) %>% 
  mutate(Oi = sum(flows), .after = orig_code) %>%
  ungroup() %>% 
  group_by(dest_code) %>%
  mutate(Dj = sum(flows), .after = dest_code) %>%
  ungroup() %>% 
  left_join(agg_tm_flows %>% distinct(vehic, beta))

new_agg_tm_flows %>% 
  select(OD, Oi, Dj, vehic, network_dist, beta) %>% 
  slice_sample(n = 5)

# A tibble: 5 x 6
  OD           Oi    Dj vehic network_dist    beta
  <chr>     <int> <int> <chr>        <dbl>   <dbl>
1 2877-2846    57   191 bus          3.40  -0.283 
2 2887-3237    35    60 taxi         5.01  -0.244 
3 3340-3340    36    97 car          0.136  0.0408
4 2922-2839   177   301 car          4.84   0.0408
5 2839-2877    84   516 bike         2.43   0.171

Say the above were sample future estimates of \(O_i^m\)s and \(D_j\)s, flows can be estimated using Equation 13

Code

# Model flows
new_agg_tm_flows <- new_agg_tm_flows %>%
  # group_by(dest_code) %>%
  # mutate(Bj = Ai * Oi * (network_dist^beta)) %>%
  # mutate(Bj = 1/sum(Bj)) %>%
  # group_by(orig_code, vehic) %>%
  # mutate(Ai = Bj * Dj * (network_dist^beta)) %>%
  #     mutate(Ai = 1/sum(Ai))%>%
  get_bf_da() %>% 
  ungroup() %>% 
  mutate(scenario_vehic_flows = round(Ai * Oi * Bj * Dj * (network_dist^beta)))


# Change in modal flows
# Observe changes in flows
agg_tm_flows %>% 
  group_by(vehic) %>% 
  summarise(observed_flows = sum(flows)) %>% 
  left_join(new_agg_tm_flows %>% 
  group_by(vehic) %>% 
  summarise(scenario_vehic_flows = sum(scenario_vehic_flows))) %>% 
  mutate(across(where(is.numeric), ~ coalesce(.x, 0)),
         diff_flows = scenario_vehic_flows - observed_flows) %>% 
  arrange(-diff_flows)

# A tibble: 7 x 4
  vehic observed_flows scenario_vehic_flows diff_flows
  <chr>          <dbl>                <dbl>      <dbl>
1 taxi              99                 2652       2553
2 car             5060                 7294       2234
3 bus              110                 1887       1777
4 ebike            876                 2132       1256
5 bike             870                 2080       1210
6 walk             287                 1131        844
7 moto            9868                    0      -9868

With this, one can then go ahead and investigate the changes in flows at destination, new routes, density of flows etc.

References

Casey Jr, Harry J. 1955. “The Law of Retail Gravitation Applied to Traffic Engineering.” Traffic Quarterly 9 (3).

Cordera, Rubén, Ángel Ibeas, Luigi dell’Olio, and Borja Alonso. 2017. Land Use–Transport Interaction Models. CRC press.

Furness, Kenneth P. 1965. “Time Function Iteration.” Traffic Engineering and Control 7 (7): 458–60.

Haynes, Kingsley E, and A Stewart Fotheringham. 2020. “Gravity and Spatial Interaction Models.”

O’Kelly, Morton. 2009. “Spatial Interaction Models.”

Ortúzar, Juan de Dios, and Luis G. Willumsen. 2011. “Modelling Transport,” March. https://doi.org/10.1002/9781119993308.

Wilson, A G. 1971. “A Family of Spatial Interaction Models, and Associated Developments.” Environment and Planning A: Economy and Space 3 (1): 1–32. https://doi.org/10.1068/a030001.

Wilson, Alan. 1970. “Entropy in Urban and Regional Modelling, Monographs in Spatial and Environmental Systems Analysis.” London, UK: Pion, no. 1.