DIMACS Reconnect, Princeton 2022
Boyan Kostadinov, City Tech
June 17, 2022
Find the optimal location for first responders to an emergency situation, using historical (simulated) data from a particular area accessible by the emergency teams, by making certain assumptions. The idea is to build a probability model for the response time and minimize the expected response time to find the optimal location for the emergency center.
We use least squares optimization to fit the power model \(t = t_0 + a d^b\) of response time \(t\) as a function \(d\), the distance traveled. Thus, we need to find the optimal values of \(t_0\), \(a\) and \(b\) that minimize the sum of squared errors between the power model prediction and the paired data of size \(n\).
\[\textrm{SSE}(t_0,a,b)=\sum_{j=1}^n(t_j - t_0 - a d_j^b)^2\]
This is nonlinear least squares since the dependence on \(b\) is nonlinear, but we can either linearize the power law or use a general purpose optimizer to find the optimal solution.
Linearization gives the extra benefit of using linear algebra and explaining the mathematics of linear least squares.
An interesting extension may be to use the Manhattan distance instead of the Euclidean distance.
We can use the call frequencies for each cell, say \(n_k\) for the \(k\)th cell, to construct a probability distribution with probabilities \((p_k)_{k=1}^{N^2}\) for a distance random variable \(D\) whose values \((d_k)_{k=1}^{N^2}\) are given by the distances \(d_k\) between the location of the first responders at \((x_0,y_0)\), and the center \((x_k,y_k)\) of the \(k\)th cell:
\[d_k = \sqrt{(x_0 - x_k)^2 + (y_0 - y_k)^2}\]
And the corresponding probabilities are given by:
\[p_k=\frac{n_k}{\sum_{k=1}^{N^2} n_k}\]
We can use the power model fitted to the data \(t=\hat{t}_0 + \hat{a} d^{\hat{b}}\) as the basis for a stochastic model for the random response time \(T\) as a function of the random distance \(D\) between the location of a random emergency call and the location of the first responders:
\[T = \hat{t}_0 + \hat{a} D^{\hat{b}}\]
The expected (average) response time is then:
\[E(T) = \hat{t}_0 + \hat{a} E(D^{\hat{b}})\]
Using the empirical distribution of \(D=(p_k,d_k)_{k=1}^{N^2}\), we can compute the expected value:
\[E(D^{\hat{b}}) = \sum_{k=1}^{N^2} p_k d_k^{\hat{b}}\]
where we will have numerical values for everything on the right-hand side except for the coordinates \((x_0,y_0)\) of the facility that will house the first responders.
The optimization problem is to minimize the expected response time and find the optimal values of \((x_0,y_0)\) for the location of the emergency center.
\[\textrm{Min }_{x_0,y_0} E(T) = \textrm{Min }_{x_0,y_0} \left ( \hat{t}_0 + \hat{a} \sum_{k=1}^{N^2} p_k \left ( \sqrt{(x_0 - x_k)^2 + (y_0 - y_k)^2} \right )^{\hat{b}} \right )\] This is a nonlinear optimization problem that can be solved with a general purpose optimizer or using an implemented from scratch random search optimization. We may also use the Manhattan distance instead of the Euclidean distance.
The idea is for the vertical strips of the shredded image to represent “cities” and to find a Hamiltonian path, which goes through the “cities”, by using a TSP solver. Then, we rearrange the vertical strips according to the optimal “city” order obtained by the TSP solver, and see if we can get a good approximation to the original image.
It is computationally easier to work with an image shredded into single-pixel columns and compute the distance matrix for all pairs of columns using an \(L^1\) or \(L^2\) norm. All three color channels (RGB) must be used when computing the distance in order to find the closest “cities”.
ggplot, using geom_tile.In this experiment, we use the TSP to draw a portrait:
The main goal is to follow Bob Vanderbei’s approach to portfolio optimization using historical prices of ETF’s tracking the major stock indices as well as the US Treasury Bonds. The optimization approach is to use a linear programming formulation based on the Manhattan norm.