The basic premise of this analysis is – what are the (most important) parameters of a given graph that determine selection of nodes/edges to monitor (STRAP-1) OR determine number of inspections on nodes/edges. Intuitively, if a graph is representative of an area, the population on its nodes or edges should dictate identification strategies for human trafficking. However, with imperfect information on proportions of trafficking on nodes or edges, it becomes important to investigate if any proxy for such proportions impacts decisions on selections or inspections. This analysis is done to investigate any such impact.
The optimization problems STRAP-1&2 are solved by randomly initializing the following:
Population on nodes - random(1,10).
Population on edges - random(1,10).
Costs of monitoring nodes and edges - random(0,3).
Proportions of trafficking activity on nodes and edges - random(0,0.5).
The following parameters are fixed:
The number of nodes V.
The number of edges E.
TPR = 0.5
TNR = 0.5
Budget = 20
At least K=2 traffickers.
At most L=10 regular individuals.
When the function executes, the solver produces solution for the given instantiated graph with V nodes and E edges. The parameters collected after each function executes the optimization is as follows:
Total no. of nodes/edges selected for monitoring (STRAP-1)
Total no. of inspections on nodes/edges (STRAP-2)
Mean and variance of population on nodes.
Mean and variance of population on edges.
Mean and variance of trafficking activity on nodes and edges.
Mean and variance of costs of monitoring and inspections.
The idea now is to analyze which of these parameters (3-6) is the most important predictor of the monitoring/inspections solutions produced by the solver. In other words, we are answering the following question:
“Given the graph of an area and its population on cities/towns and the transportation links connecting them, how to decide on inspections vs. selections of cities/towns for monitoring?”
## [1] "Number of nodes 10 Number of edges 45"
## Loading required package: MASS
## Loading required package: boot
## Loading required package: survey
## Loading required package: grid
## Loading required package: Matrix
## Loading required package: survival
##
## Attaching package: 'survival'
## The following object is masked from 'package:boot':
##
## aml
##
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
##
## dotchart
## Loading required package: mitools
## This is the global version of package relaimpo.
## If you are a non-US user, a version with the interesting additional metric pmvd is available
## from Ulrike Groempings web site at prof.beuth-hochschule.de/groemping.
##
## Call:
## lm(formula = nodesedges_choice ~ ., data = dmat1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.4825 -0.8096 -0.0531 0.7283 4.0213
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.913609 0.150032 26.085 < 2e-16 ***
## `mean-E` -0.053767 0.007049 -7.628 2.49e-14 ***
## `mean-V` -0.263264 0.014998 -17.553 < 2e-16 ***
## `mean-Cw` 0.029624 0.046762 0.634 0.526409
## `mean-Cv` 0.021349 0.022164 0.963 0.335448
## `mean-r` 0.084201 0.268422 0.314 0.753760
## `mean-s` 0.110642 0.125879 0.879 0.379437
## `var-E` -0.009003 0.002714 -3.317 0.000912 ***
## `var-V` -0.066340 0.006344 -10.457 < 2e-16 ***
## `var-Cw` 0.087468 0.077851 1.124 0.261226
## `var-Cv` 0.038032 0.032023 1.188 0.234985
## `var-r` 1.418884 2.019330 0.703 0.482282
## `var-s` 0.567491 0.864446 0.656 0.511523
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8121 on 19987 degrees of freedom
## Multiple R-squared: 0.02399, Adjusted R-squared: 0.0234
## F-statistic: 40.94 on 12 and 19987 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = nodesedges_inspections ~ ., data = dmat2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.05532 -0.01351 -0.00514 0.00298 1.95682
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.202e+00 1.376e-02 668.650 < 2e-16 ***
## `mean-E` -4.630e-04 6.470e-04 -0.716 0.47424
## `mean-V` -3.733e-03 1.367e-03 -2.731 0.00632 **
## `mean-Cw` -3.266e-04 4.287e-03 -0.076 0.93928
## `mean-Cv` 1.535e-03 2.042e-03 0.752 0.45233
## `mean-r` -3.370e-01 2.469e-02 -13.647 < 2e-16 ***
## `mean-s` -7.091e-02 1.150e-02 -6.167 7.10e-10 ***
## `var-E` -5.173e-04 2.517e-04 -2.055 0.03987 *
## `var-V` 6.157e-04 5.887e-04 1.046 0.29571
## `var-Cw` 6.092e-05 7.127e-03 0.009 0.99318
## `var-Cv` -1.903e-03 2.937e-03 -0.648 0.51689
## `var-r` -2.934e+00 1.832e-01 -16.017 < 2e-16 ***
## `var-s` -6.247e-01 7.851e-02 -7.957 1.85e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0743 on 19987 degrees of freedom
## Multiple R-squared: 0.02667, Adjusted R-squared: 0.02608
## F-statistic: 45.63 on 12 and 19987 DF, p-value: < 2.2e-16
The above summary provides the first look at the most important determinants of the ‘inspection vs. selection’ decision. As evident, variance in trafficking activity on nodes or edges is a stronger determinant of inspection while average population on nodes or edges is a stronger determinant of selection decision. This is further emphasized by analyzing the ‘most important predictor variable’ or relative importance of regressors in the linear model. We use the LMG methodology.
## mean-V var-V mean-E var-E
## 0.0150331393 0.0053647707 0.0027610566 0.0005678792
## var-r mean-r var-s mean-s
## 0.012367921 0.008849545 0.003017544 0.001799478