Project Ideas for DIMACS Modules

DIMACS Reconnect, Princeton 2022

Boyan Kostadinov, City Tech

June 17, 2022

Project 1:
Optimizing the Location of First Responders

The Optimization Problem

Find the optimal location for first responders to an emergency situation, using historical (simulated) data from a particular area accessible by the emergency teams, by making certain assumptions. The idea is to build a probability model for the response time and minimize the expected response time to find the optimal location for the emergency center.

Assumptions

  • The area to be covered by the emergency response team is divided into a rectangular grid of size \(N\times N\).
  • Historical (simulated) paired data \((t_j,d_j)_{j=1}^n\) is provided for the response time and the distance to the emergency location.
  • Historical (simulated) data is provided for the frequency of emergency calls (per year) from each cell in the grid.
  • We assume a coordinate system centered at the lower, left-hand corner of the rectangular grid.

Fitting a power model to the data

We use least squares optimization to fit the power model \(t = t_0 + a d^b\) of response time \(t\) as a function \(d\), the distance traveled. Thus, we need to find the optimal values of \(t_0\), \(a\) and \(b\) that minimize the sum of squared errors between the power model prediction and the paired data of size \(n\).

\[\textrm{SSE}(t_0,a,b)=\sum_{j=1}^n(t_j - t_0 - a d_j^b)^2\]

Fitting a power model to the data

This is nonlinear least squares since the dependence on \(b\) is nonlinear, but we can either linearize the power law or use a general purpose optimizer to find the optimal solution.

Linearization gives the extra benefit of using linear algebra and explaining the mathematics of linear least squares.

An interesting extension may be to use the Manhattan distance instead of the Euclidean distance.

Building a probability model

We can use the call frequencies for each cell, say \(n_k\) for the \(k\)th cell, to construct a probability distribution with probabilities \((p_k)_{k=1}^{N^2}\) for a distance random variable \(D\) whose values \((d_k)_{k=1}^{N^2}\) are given by the distances \(d_k\) between the location of the first responders at \((x_0,y_0)\), and the center \((x_k,y_k)\) of the \(k\)th cell:

\[d_k = \sqrt{(x_0 - x_k)^2 + (y_0 - y_k)^2}\]

And the corresponding probabilities are given by:

\[p_k=\frac{n_k}{\sum_{k=1}^{N^2} n_k}\]

Building a probability model

We can use the power model fitted to the data \(t=\hat{t}_0 + \hat{a} d^{\hat{b}}\) as the basis for a stochastic model for the random response time \(T\) as a function of the random distance \(D\) between the location of a random emergency call and the location of the first responders:

\[T = \hat{t}_0 + \hat{a} D^{\hat{b}}\]

The expected (average) response time is then:

\[E(T) = \hat{t}_0 + \hat{a} E(D^{\hat{b}})\]

Building a probability model

Using the empirical distribution of \(D=(p_k,d_k)_{k=1}^{N^2}\), we can compute the expected value:

\[E(D^{\hat{b}}) = \sum_{k=1}^{N^2} p_k d_k^{\hat{b}}\]

where we will have numerical values for everything on the right-hand side except for the coordinates \((x_0,y_0)\) of the facility that will house the first responders.

Minimizing the average response time

The optimization problem is to minimize the expected response time and find the optimal values of \((x_0,y_0)\) for the location of the emergency center.

\[\textrm{Min }_{x_0,y_0} E(T) = \textrm{Min }_{x_0,y_0} \left ( \hat{t}_0 + \hat{a} \sum_{k=1}^{N^2} p_k \left ( \sqrt{(x_0 - x_k)^2 + (y_0 - y_k)^2} \right )^{\hat{b}} \right )\] This is a nonlinear optimization problem that can be solved with a general purpose optimizer or using an implemented from scratch random search optimization. We may also use the Manhattan distance instead of the Euclidean distance.

Project 2:
Reconstructing a Shredded Image

The Original Image

The image shredded
into vertical strips

Reconstructing the shredded image

The idea is for the vertical strips of the shredded image to represent “cities” and to find a Hamiltonian path, which goes through the “cities”, by using a TSP solver. Then, we rearrange the vertical strips according to the optimal “city” order obtained by the TSP solver, and see if we can get a good approximation to the original image.

It is computationally easier to work with an image shredded into single-pixel columns and compute the distance matrix for all pairs of columns using an \(L^1\) or \(L^2\) norm. All three color channels (RGB) must be used when computing the distance in order to find the closest “cities”.

The image reconstructed from one shredded into single-pixel columns

Project 3:
Visual Art Via Tiling

The tiling process

  1. Load the image and convert it to gray scale.
  2. Reduce the resolution of the image as well as its dimension. Each new (big) pixel is summarized with the mean of the gray scale values of the group of original pixels inside it.
  3. Divide these average values into a number of groups.
  4. Represent the new pixels with ggplot, using geom_tile.

Visual art with tiles

Visual art via the TSP

In this experiment, we use the TSP to draw a portrait:

  1. Load a photo.
  2. Convert it to black and white.
  3. Generate a random sample of points that will play the role of the “cities”.
  4. Solve the TSP to compute a route among the points.
  5. Plot the route. The result is a single, continuous line drawing of the image.

Visual art via the TSP

Project 4:
Portfolio Optimization

A Markowitz-type portfolio optimization

The main goal is to follow Bob Vanderbei’s approach to portfolio optimization using historical prices of ETF’s tracking the major stock indices as well as the US Treasury Bonds. The optimization approach is to use a linear programming formulation based on the Manhattan norm.