1 Introduction

1.1 About

Optimization is important in many fields, including in data science. In manufacturing, where every decision is critical to the process and the profit of organization, optimization is often employed, from the number of each products produced, how the unit is scheduled for production, get the best or optimal process parameter, and also the routing determination such as the traveling salesman problem. In data science, we are familiar with model tuning, where we tune our model in order to improve the model performance. Optimization algorithm can help us to get a better model performance.

Some problems have conflicting objectives, or trade-off, such as balancing between maximizing return while minimizing risk, or maximizing profit while minimizing production cost. These kind of problem belong to Multi-Objective Optimization Problem (MOOP). Solving MOOP is more difficult and require different approach than a single objective optimization. One example of algorithm to solve this kind of problem is Multi-Objective Evolutionary Algorithm, which is an extension of Genetic Algorithm for MOOP. If you are unfamiliar with Genetic Algorithm, you can check my previous post¹ before continue to read. One of the most popular Multi-Objective Evolutionary Algorithm is NSGA-II². Many of new algorithms are derived from this method. We will learn how to deploy them to solve MOOP in business problem.

This post is dedicated to learn how to solve Multi-Objective Optimization Problem and build the optimization algorithm with R.

1.2 Learning Objectives

Learn how to solve Multi-Objective Optimization Problem
Learn how NSGA-II works
Learn the implementation of NSGA-II

1.3 Library and Setup

library(tidymodels)
library(ranger)
library(scales)
library(plotly)
library(tidyverse)
library(lubridate)

options(scipen = 999)

2 Multi-Objective Optimization: Concept

2.1 Multiple Objectives

Optimization is a process to be better. The goal of optimization is to find a solution that could maximize or minimize single objective function in accordance to the constraints. For example, we want to maximize the profit while at the same time minimizing the cost of production. If the problem has more than a single objectives, the problem could be modified to be a single objective problem, such as aggregating thei value and giving weight to each objective functions. However, this method has some disadvantages:

The method requires knowledge about the priority or weight of each objectives
The aggregate function only give a single solution
The trade-off between objectives cannot be known and evaluated.
The solution my be unattainable if the search space is a convex.

Meanwhile, in multiobjective optimization problem (MOOP), the trade-off can be identified since MOOP algorithm give multiple solutions across the pareto front. Multiple solutions also give the decision maker some alternative to choose instead of a single solution.

2.2 Pareto Front

Since MOOP has more than one objective functions, we can’t directly say if solution A is better than solution B. We have something called pareto-dominance.

Given:

A vector of objective function \(\vec{f}\vec{_(x)} = [f_1 \vec{_(x)}, f_2 \vec{_(x)}, ..., f_k \vec{_(x)}]\)
And a feasible solution of \(\Omega\)

Multi-Objective Optimization Problem is concerned to find vector \(\vec{x} \epsilon \Omega\) that optimize the vector function \(\vec{f}\vec{_(x)}\)

In a minimization problem, a vector \(\vec{x}\) said to dominate \(\vec{y}\) (denoted \(\vec{x} \prec \vec{y}\)) if:

\(f_i \vec{x} \leq f_i \vec{y}\) for all \(i\) function in \(f_i\)
There is at least one \(i\) such that \(f_i \vec{x} < f_i \vec{y}\)

In a simple word, if we have solution called solution x, the solution is said to dominate (being better than) another solution, called solution y, if all of objective functions from solution x are less or equal than all of objective functions from solution y and there is at least one objective function from solution x that is better (has lower value in minimization) than objective function from solution y.

A solution is called dominant if it has at least one value of objective function that is better than the other while have better or equal value on another objective function. Look at the graph below.

Suppose we have some solution (let’s call it solution A, B, C, and D) with the value of the first objective function as the x axis (f1) and second objective function (f2) as the y axis. Both objective function is a minimization problem, so the closer the position to the bottom left, the better the solution is. From the figure above, we can infer several things:

Solution B is said to be non-dominated with solution A, since f1 of A is lower than B but the f2 is bigger than B
Since solution A and B are non-dominated to each other, they belong to the same group/front
Solution C is dominated by solution A and B, since the f1 of C is lower than A and B, with f2 of C lower than B or equal with A
Since solution C is dominated by A and B, it belong to the lower front (front 2).
Solution D is dominated by solution A, B, and C because both objective values are worst than them.
Since solution D is dominated by A,B, and C, it belong to the lowest front (front 3).
True Pareto Front is a set of optimal solutions that we seek to achieve in MOOP.

2.3 Multi-Objective Evolutionary Algorithm

Multi-Objective Evolutionary Algorithm (MOEA) is the application of evolutionary algorithm, such as the genetic algorithm to handle MOOP. There are already massive amount of research in this area, resulting in many variety of algorithms. NSGA-II is one of the popular choice and has become an inspiration for the later research.

Before we proceed, let’s review how the Genetic Algorithm works.

There will be a population of randomly selected chromosomes or solutions.
The fitness values or the objective function values from each solutions are calculated.
From the population, two of the solutions will be chosen as parent solutions, either by tournament selection or any other selection methods.
The chosen solutions will be crossed to create new solutions, called child solutions.
The child solutions may change due to the random mutations (that has really low probability to happen).
In the end of the iteration, new population will be chosen from the parent solutions or the initial population and the child solutions based on the fitness values.
As long as the stopping criterion is not fulfilled, usually in term of number of iterations, the algorithm will continue to iterate.
Best or optimal solutions are those with the optimal fitness value.

For multi-objective problem, NSGA-II works with the same workflow with the above process. The difference is, at the end of each iteration, new population for the next iteration is chosen using non-dominated sorting method. The solutions are divided into several front based on their domination status against each other. If we have initial population size of 10, the top 10 individuals from parent and children population from the better front will be chosen for next iteration. Also, to control the diversity of solutions, the crowding distance, which corresponds to the Manhattan distance of two neighboring solutions for two objectives, of each solutions is calculated³. The ones with the largest crowing distance are finally selected.

3 Case Study

3.1 Portofolio Optimization with 3 Objectives

The problem is replicated from the study of Anagnostopolous and Mamanis⁴, which involves 3 objective functions. The study employed a MOEA for portfolio selection and optimization in investment management.

Portfolio optimization problem is concerned with managing the portfolio of assets that minimizes the risk objectives subjected to the constraint for guaranteeing a given level of returns. One of the fundamental principles of financial investment is diversification where investors diversify their investments into different types of assets. Portfolio diversification minimizes investors exposure to risks, and maximizes returns on portfolios. In some case, it is very difficult for the decision maker to know on beforehand the optimal number of securities that should be included in his/her portfolio without examining all the tradeoffs between risk, return and the cardinality of the portfolio. In any case, an investor might lose important solutions with large tradeoff between the objectives when he/she is forced to fix the number of assets in the portfolio on beforehand.

The objective function consists of:

1. Maximizing Returns

The returns

\[max \ \mu(x) = \sum_{i=1}^{N} x_i \ \mu_i\]

\(mu(x)\): total mean of returns

\(x_i\): weight of assets \(i\)

\(mu_i\): mean of returns of assets \(i\)

2. Minimizing Risk

The risk of the portfolio is represented by the variance of return.

\[min \ \rho(x) = \sum_{i=1}^{N}\sum_{j=1}^{N} x_i \ x_j \ \sigma_{ij}\]

\(\rho(x)\): total risk of portfolios

\(\sigma_{ij}\): covariance between asset \(i\) and \(j\)

3. Minimizing Number of Assets

Number of assets is minimized. The number of assets is the sum of all asset that has weight > 0.

\[min \ card(x) = \sum_{i=1}^{N} 1_{x>0}\]

\(card(x)\): number of assets

Subject To

\[\sum_{i=1}^{N} x_i = 1\]

\[0 \leq x_i \leq 1\]

3.1.1 Import Data

Data is acquired from New York Stock Exchange on Kaggle (https://www.kaggle.com/dgawlik/nyse). We will only use data from January to March of 2015 for illustration.

nyse <- data.table::fread("data_input/prices.csv")

nyse <- nyse %>% 
  mutate(date = ymd(date)) %>% 
  filter(year(date) == 2015,
         month(date) %in% c(1:3))

head(nyse)

To get clearer name of company, let’s import the Ticker Symbol and Security.

securities <- data.table::fread("data_input/securities.csv")
securities <- securities %>% 
  select(`Ticker symbol`, Security) %>% 
  rename(symbol = `Ticker symbol`,
         name = Security)

Joint them with the price data.

nyse <- nyse %>% 
   left_join(securities, by = c("symbol")) %>% 
   select(date, symbol, name, everything())

head(nyse)

Let’s limit the number of available stocks to only randomly selected 30 stock. Too many choice will lead to longer runtime. Since it is just an introduction, I will reduce them.

set.seed(13)
selected_stock <- sample(nyse$symbol, 30)

nyse <- nyse %>% 
  filter(symbol %in% selected_stock)
head(nyse)

3.1.2 Calculate Returns

Let’s calculate the daily returns.

nyse <- nyse %>%   
  rename(price = close) %>% 
  select(date, symbol, name, price) %>% 
  group_by(symbol, name) %>%
  mutate(price_prev = lag(price),
         returns = (price - price_prev)/price_prev) %>% 
  slice(-1) %>% 
  ungroup()

head(nyse)

Let’s calculate the mean return of each stock.

mean_stock <- nyse %>% 
  group_by(symbol) %>% 
  summarise(mean = mean(returns)) %>% 
  arrange(desc(mean))

head(mean_stock)

The value of \(R_f\) is acquired from the latest interest rate on a three-month U.S. Treasury bill. Since the data is from 2016, we will use data from 2015 (Use data from March 27, 2015), which is 0.04%. The rate is acquired from https://ycharts.com/indicators/3_month_t_bill.

rf <- 0.04/100

3.1.3 Covariance Matrix Between Portofolio

Calculate the covariance matrix between portofolio. First, we need to separate the return of each portofolio into several column by spreading them.

nyse_wide <- nyse %>%
  pivot_wider(id_cols = c(date, symbol), names_from = symbol, values_from = returns) %>% 
  select(-date)

# Create Excess Return
for (i in 1:n_distinct(nyse$symbol)) {
  nyse_wide[,i]<- nyse_wide[,i] - as.numeric(mean_stock[i,2])
}
  
head(nyse_wide)

Create the covariance matrix.

nyse_cov <- cov(x = nyse_wide)

3.1.4 Define Fitness Function

Let’s define the fitness function.

fitness <- function(x){
  # Assign weight for each stocks
  weight_stock <- x
  
 # Calculate the total returns
 f1 <- numeric()
 for (i in 1:n_distinct(nyse$symbol)) {
   f1[i] <- weight_stock[i]*mean_stock$mean[i]
 }
 mean_return <- sum(f1) - 1e9 * (round(sum(weight_stock),10)-1)^2 
 
 # Calculate the total risk
 f2 <- numeric()
 for (i in 1:n_distinct(nyse$symbol)) {
   f3 <- numeric()
   
   for (j in 1:n_distinct(nyse$symbol)) {
    f3[j] <- weight_stock[i]*weight_stock[j]*nyse_cov[i,j]
   }
 f2[i] <- sum(f3)
 }
 risk <- sum(f2) + 1e9 * (round(sum(weight_stock),10)-1)^2
 
 # Calculate the number of assets
 card <- length(weight_stock[weight_stock > 0]) 
 
 return(c(-mean_return, risk, card))
}

3.1.5 Run the Algorithm

The NSGA-II algorithm can be ran with nsga2() function from nsga2R package.

library(nsga2R)

Here is some parameter for the function:

fn: the fitness function to be minimized
varNo: Number of decision variables
objDim: Number of objective functions
lowerBounds: Lower bounds of each decision variable
upperBounds: Upper bounds of each decision variable
popSize: Size of population
generations: Number of generations

We will run the algorithm for 1000 iterations, with population size of 200. The probability of mutation is 0.2 while the probability for crossover is 0.8.

set.seed(123)
n_asset <- n_distinct(nyse$symbol)
finance_optim <- nsga2R(fn = fitness, varNo = n_asset, objDim = 3, generations = 1000,
                        mprob = 0.2, popSize = 200, cprob = 0.8,
                        lowerBounds = rep(0, n_asset), upperBounds = rep(1, n_asset))

To check the pareto front solutions, you can look at the objectives and subset only those with pareto front rank of 1. Let’s see how many different optimal solutions are generated.

finance_optim$objectives[finance_optim$paretoFrontRank == 1, ] %>% 
   matrix(ncol = 3) %>% 
   as.data.frame() %>% 
   distinct() %>% 
   mutate(V1 = round(-V1, 5),
          V2 = round(V2, 5)) %>% 
   rename(`Total Return` = V1,
          Risk = V2,
          `Number of Assets` = V3)

Turns out there is only 1 solution.

Let’s check the total weight

sum(finance_optim$parameters[1, ])

## [1] 1

The sum of the weight is 1, so we can be confident that our solution doesn’t violate the constraint.

If you want to get the weight for each asset, you can use parameters object from the finance_optim. Let’s look the parameter for the first solution.

data.frame(symbol = unique(nyse$symbol),
           name = unique(nyse$name),
           weight = finance_optim$parameters[1, ]) %>% 
   mutate(return = round(weight * mean_stock$mean, 5),
          weight = round(weight,5))

3.2 Vehicle Crashworthiness

The vehicle crashworthiness problem⁵ is a multi-objective problem where the crash safety level of a vehicle is optimized. A higher safety level means how well a vehicle can protect the occupants from the effects of a frontal accident.

The study consider a full-width frontal crash test and an offset-frontal crash test. The full-width impact usually results in a high deceleration. It is known that the high deceleration can cause serious biomechanical injuries to the occupants while the offset test is demanding for the structural integrity of a vehicle. To improve the overall crashworthiness of a frontal structure of a vehicle, these two crashing scenarios should be considered simultaneously. There are five decision variables that represent the thickness of reinforced members around the car front, as shown on the figure below.

The following is 3 objectives that need to be achieved for optimal crashworthiness:

\[minimize\ f_1(x) = 1640.2823 + 2.3573285\ x_1 + 2.3220035\ x_2 +4.5688768\ x_3 + 7.7213633\ x_4 + 4.4559504\ x_5\]

\[minimize\ f_2(x) = 6.5856 + 1.15\ x_1 − 1.0427\ x_2 + 0.9738\ x_3 + 0.8364\ x_4 − 0.3695\ x_1x_4 +0.0861\ x_1x_5 + \\0.3628\ x_2x_4 − 0.1106\ x_1^2 − 0.3437\ x_3^2 + 0.1764\ x_4^2\]

\[minimize\ f_3(x) = −0.0551 + 0.0181\ x_1 + 0.1024\ x_2 + 0.0421\ x_3 −0.0073\ x_1x_2 + 0.024\ x_2x_3 − \\ 0.0118\ x_2x_4 − 0.0204\ x_3x_4 −0.008\ x_3x_5 − 0.0241\ x_2^2 + 0.0109\ x_4^2\]

Constrained with:

\[1 \leq x_i \leq 3\]

\(f_1\): The mass of the vehicle

\(f_2\): An integration of collision acceleration in the full frontal crash

\(f_3\): The toe-board intrusion in the 40% offset-frontal crash

3.2.1 Define Fitness Function

Let’s define the fitness function

fitness <- function(x){
  # Calculate f1(x)
  f1 <- 1640.2823 + 2.3573285*x[1] + 2.3220035*x[2] +4.5688768*x[3] +
    7.7213633*x[4] + 4.4559504*x[5]
  
  # Calculate f2(x)
  f2 <- 6.5856 + 1.15*x[1] - 1.0427*x[2] + 0.9738*x[3] + 0.8364*x[4] - 
    0.3695*x[1]*x[4] + 0.0861*x[1]*x[5] + 0.3628*x[2]*x[4] - 
    0.1106*x[1]^2 - 0.3437*x[3]^2 + 0.1764*x[4]^2
  
  # Calculate f3(x)
  f3 <- -0.0551 + 0.0181*x[1] + 0.1024*x[2] + 0.0421*x[3] - 
    0.0073*x[1]*x[2] + 0.024*x[2]*x[3] - 0.0118*x[2]*x[4] -
    0.0204*x[3]*x[4] - 0.008*x[3]*x[5] - 0.0241*x[2]^2 + 0.0109*x[4]^2
  
  return(c(f1,f2,f3))
}

3.2.2 Run The Algorithm

We will run NSGA-II with 5000 generations and population size of 100.

set.seed(123)
car_optim <- nsga2R(fn = fitness, varNo = 5, objDim = 3, generations = 5000, popSize = 100, 
                        lowerBounds = rep(1, 5), upperBounds = rep(3, 5))

Let’s look at the optimal solution: those who belong to the first front.

car_result <- car_optim$objectives %>% 
  as.data.frame() %>% 
  bind_cols(as.data.frame(car_optim$parameters)) %>% 
  filter(car_optim$paretoFrontRank == 1) %>% 
  mutate_all(.funs = function(x){round(x,3)}) %>%
  distinct() %>%  
  rename("f1" = V1, "f2" = V2, "f3" = V3,
         "x_1" = V11, "x_2" = V21, "x_3" = V31, "x_4" = V4, "x_5" = V5)

car_result

As we can see, we can achieve more than 1 similar objective values, indicating that all of those solutions are non-dominated to each other: no solution is better than the other. Let’s check the first 3 solutions.

car_result[1:3, ]

Is the first solution better than the second? Remember that all of them are minimization problem.

The f1 value of the first solution is higher than second solution
The f2 value of the first solution is lower than second solution
The f3 value of the first solution is higher than second solution

The second solution can dominate or better than the first solution if the f2 value is lower than the first solution. Since the f2 of second solution is higher, the first and second solution are not dominate each other, thus they are equally good.

How about the first solution with the third one?

The f1 value of the first solution is lower than third solution
The f2 value of the first solution is lower than third solution
The f3 value of the first solution is higher than third solution

The first solution can dominate the third solution if the f3 value is lower than the third solution. Since it is not, then the first solution and the third solution are non-dominated to each other.

The consequence is that the decision maker can decide which solution to pick and look at the trade-off between objectives. If the decision maker prioritize lower mass (f1), they should choose solution that has the lowest value of f1.

car_result %>% 
  filter(f1 %in% c(min(f1), max(f1)))

If you want to choose solution with lowest mass, you can choose the first row of the above table. If you want to choose solution with biggest mass, you can choose the second row instead. Based on the table above, if we choose lower mass, it would result in higher intrusion f3. This trade-off should be considered by decision maker.

We can visualize the objective values.

plot_ly(car_result, x = ~f1, y = ~f2, z = ~f3) %>%
  add_markers() %>%
  layout(scene = list(xaxis = list(title = 'Mass'),
                     yaxis = list(title = 'Collision Acceleration'),
                     zaxis = list(title = 'Intrusion')))

4 Conclusion

Multi-Objective Optimization Problem (MOOP) should be solved differently than a single objective problem. MOEA, like NSGA-II, can be used to solve MOOP. There are still many algorithm that can perform better and improve some aspects of NSGA-II⁶. Lack of package development for MOEA in R is a concern. We may need to build our own function if we are not satisfied with NSGA-II.

Multi-Objective Optimization

Arga Adyatama

1/3/2020