The basic premise of STRAP-III formulation is as follows: if a decision-maker is faced with a mix of humans and digital surveillance resources for human trafficking, what should be the allocation policy for an area such that any type-I or type-II errors are minimized? That is, which nodes and/or edges should be allocated a resource for minimizing detection error and ensuring at least K ‘units’ of trafficking activity is detected (traffickers or victims).
In order to illustrate solutions to this problem, consider the following graph:
## unable to reach CRAN
## Network attributes:
## vertices = 5
## directed = TRUE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges = 13
## missing edges = 0
## non-missing edges = 13
## density = 0.65
##
## Vertex attributes:
##
## city_population:
## numeric valued attribute
## attribute summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.0 10.0 12.0 11.8 14.0 14.0
## vertex.names:
## character valued attribute
## 5 valid vertex names
##
## Edge attributes:
##
## travel_population:
## numeric valued attribute
## attribute summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 3.000 5.000 4.615 6.000 7.000
## [1] "Population on vertices is "
## [1] 9 14 12 14 10
## [1] "Population on edges is given by the following adjacency matrix "
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0 7 1 5 4
## [2,] 0 0 5 3 0
## [3,] 7 0 0 0 0
## [4,] 0 0 7 0 3
## [5,] 3 6 6 3 0
Given this graph, as representative of an area, we now start with defining the optimization problem. First, initialize the parameters of the graph:
## [1] "Cost of edge surveillance"
## [,1]
## [1,] 0.2985909
## [2,] 0.3824498
## [3,] 0.3392299
## [4,] 0.3043190
## [5,] 0.2813981
## [6,] 0.1603479
## [7,] 0.2160844
## [8,] 0.2538751
## [9,] 0.2682912
## [10,] 0.1252195
## [11,] 0.3268904
## [12,] 0.2290141
## [13,] 0.1896407
## [1] "Cost of vertex surveillance"
## [,1]
## [1,] 0.1648461
## [2,] 0.3378435
## [3,] 0.1480809
## [4,] 0.3732748
## [5,] 0.1168643
## [1] "Estimate (noisy) of human trafficking on edges"
## [,1]
## [1,] 0.19652491
## [2,] 0.40396945
## [3,] 0.09543817
## [4,] 0.03125569
## [5,] 0.40143971
## [6,] 0.22691506
## [7,] 0.05852149
## [8,] 0.04273295
## [9,] 0.12109762
## [10,] 0.48582426
## [11,] 0.41593361
## [12,] 0.32599641
## [13,] 0.32737021
## [1] "Estimate (noisy) of human trafficking on nodes"
## [,1]
## [1,] 0.05899827
## [2,] 0.34617981
## [3,] 0.24976303
## [4,] 0.14106362
## [5,] 0.44876007
## [1] "Sensitivity of surveillance devices (Homogeneous)"
## [1] 0.4
## [1] "Specificity of surveillance devices (Homogeneous)"
## [1] 0.4
Now, consider the device level imperfections – it may be corruption in case of humans or technical limitations (battery power, processing capacity) in case of devices. Initialize the following vector for the number of devices and the corresponding imperfection values. Higher values can be construed for devices while lower values can be for humans:
## [1] "Imperfection values are: "
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 0.2681594 0.2403329 0.2513642 0.19619 0.1912825 0.02482012 0.1146641
## [,8]
## [1,] 0.1875591
We now initialize the optimization setup. Using OMPR and GPLK solvers, following is a setup that can be used for solving this allocation problem:
#Optimization using MIP
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(ROI)
## ROI: R Optimization Infrastructure
## Registered solver plugins: nlminb, glpk.
## Default solver: auto.
library(ROI.plugin.glpk)
library(ompr)
library(ompr.roi)
m <- MIPModel() %>%
##Variables that consider allocation on nodes
add_variable(occn[i,l], i=1:length(devices), l=1:(num_nodes), type="binary") %>%
##Variables that consider allocation on edges
add_variable(occe[i,j], i=1:length(devices), j=1:(num_edges), type="binary") %>%
##Budget constraint
add_constraint(sum_expr(CV[l,1]*occn[i,l], i=1:length(devices), l=1:num_nodes)+sum_expr(CE[j,1]*occe[i,j], i=1:length(devices), j=1:num_edges)<= B) %>%
##Trafficking activity constraint - at least K should be detected
add_constraint(sum_expr(H[i,1]*occn[i,l]*S2[l,1]*ytpr, i=1:length(devices), l=1:num_nodes)+sum_expr(H[i,1]*occe[i,j]*W2[j,1]*ytpr,i=1:length(devices), j=1:num_edges)>= K) %>%
##Allocation constraints
##Each resource can be allocated to at most 1 vertex
add_constraint(sum_expr(occn[i,l], i=1:length(devices))<=1, l=1:(num_nodes)) %>%
##Each resource can be allocated to at most 1 edge
add_constraint(sum_expr(occe[i,j], i=1:length(devices))<=1, j=1:(num_edges)) %>%
##Each resource can be allocated to at most 1 vertex OR 1 edge but NOT BOTH
add_constraint(occe[i,j]+occn[i,l]<=1, i=1:length(devices), j=1:(num_edges), l=1:(num_nodes)) %>%
##Each vertex can be allocated to at most 1 resource
add_constraint(sum_expr(occn[i,l], l=1:(num_nodes))<=1, i=1:length(devices)) %>%
##Each edge can be allocated to at most 1 resource
add_constraint(sum_expr(occe[i,j], j=1:(num_edges))<=1, i=1:length(devices)) %>%
##Overall allocations should not exceed number of resources available
add_constraint(sum_expr(occn[i,l],i=1:length(devices), l=1:(num_nodes))+sum_expr(occe[i,j], i=1:length(devices), j=1:(num_edges)) <= length(devices)) %>%
#Objective function setup
set_objective(sum_expr(H[i,1]*occn[i,l]*S1[l,1]*(1-ytnr), i=1:length(devices), l=1:num_nodes)+sum_expr(H[i,1]*occn[i,l]*S2[l,1]*(1-ytpr), i=1:length(devices), l=1:num_nodes)+sum_expr(H[i,1]*occe[i,j]*W1[j,1]*(1-ytnr), i=1:length(devices), j=1:num_edges)+sum_expr(H[i,1]*occe[i,j]*W2[j,1]*(1-ytpr), i=1:length(devices), j=1:num_edges),"min")
#Solving model using GLPK Solver
result <- solve_model(m, with_ROI(solver = "glpk", verbose = TRUE))
## <SOLVER MSG> ----
## GLPK Simplex Optimizer, v4.65
## 557 rows, 144 columns, 1760 non-zeros
## 0: obj = 0.000000000e+00 inf = 1.000e+00 (1)
## 7: obj = 3.934501603e+00 inf = 0.000e+00 (0)
## * 31: obj = 3.493956617e+00 inf = 1.539e-15 (0)
## OPTIMAL LP SOLUTION FOUND
## GLPK Integer Optimizer, v4.65
## 557 rows, 144 columns, 1760 non-zeros
## 144 integer variables, all of which are binary
## Integer optimization begins...
## Long-step dual simplex will be used
## + 31: mip = not found yet >= -inf (1; 0)
## Solution found by heuristic: 3.85456108048
## Solution found by heuristic: 3.73979156046
## Solution found by heuristic: 3.64479872829
## Solution found by heuristic: 3.59471115308
## Solution found by heuristic: 3.57472270357
## Solution found by heuristic: 3.5688337794
## Solution found by heuristic: 3.5688337794
## Solution found by heuristic: 3.5657489603
## Solution found by heuristic: 3.5616929605
## Solution found by heuristic: 3.55671606643
## + 1204: mip = 3.556716066e+00 >= tree is empty 0.0% (0; 481)
## INTEGER OPTIMAL SOLUTION FOUND
## <!SOLVER MSG> ----
#Indices of vertex/edges where resource is allocated
which(result$solution>0)
## occe[4,2] occe[8,5] occe[1,10] occe[2,11] occe[5,12] occn[6,2] occn[3,5]
## 12 40 73 82 93 118 139
Reading the output should be as follows: For occe[x,y] or occn[x,y], x => the resource while y => the edge/node it is allocated to. Now here are a few questions:
Does having more variety in options to choose from (resource count) help in achieving the goal of at least K units of trafficking activity detection at budget B?
At what point (resource count) does it become unfeasible to decide on an allocation policy that minimizes detection error while achieving at least K units detected?
Both these questions, from a practice perspective, help in planning recruitment and training of humans and device purchase in case of planning deployment.