Genetic Algorithm

Genetic Algorithm (GA) is a search-based optimization technique based on the principles of Genetics and Natural Selection. It is frequently used to find optimal or near-optimal solutions to difficult problems which otherwise would take a lifetime to solve. It is frequently used to solve optimization problems, in research, and in machine learning.

GA generates a population, the individuals in this population (often called chromosomes) have a given state. Once the population is generated, the state of these individuals is evaluated and graded on their value. The best individuals are then taken and crossed-over - in order to hopefully generate ‘better’ offspring - to form the new population. In some cases the best individuals in the population are preserved in order to guarantee ‘good individuals’ in the new generation (this is called elitism).

To explain the example I will use my version of the Knapsack problem.

You are going to spend a month in the wilderness. You’re taking a backpack with you, however, the maximum weight it can carry is 20 kilograms. You have a number of survival items available, each with its own number of “survival points”. You’re objective is to maximize the number of survival points.

The following table shows the items you can choose from.

ITEM SURVIVALPOINTS WEIGHT pocketknife 10.00 1.00 beans 20.00 5.00 potatoes 15.00 10.00 unions 2.00 1.00 sleeping bag 30.00 7.00 rope 10.00 5.00 compass 30.00 1.00

Setup Dataset

library(genalg)
library(ggplot2)

dataset <- data.frame(item = c("pocketknife", "beans", "potatoes", "unions", 
    "sleeping bag", "rope", "compass"), survivalpoints = c(10, 20, 15, 2, 30, 
    10, 30), weight = c(1, 5, 10, 1, 7, 5, 1))
weightlimit <- 20

dataset
##           item survivalpoints weight
## 1  pocketknife             10      1
## 2        beans             20      5
## 3     potatoes             15     10
## 4       unions              2      1
## 5 sleeping bag             30      7
## 6         rope             10      5
## 7      compass             30      1

Setup Evalutation Function

Example

Before creating the model we have to set-up an evaluation function. The evaluation function will evaluate the different individuals (chromosomes) of the population on the value of their gene configuration.

An individual can for example have the following gene configuration: 1001100.

Each number in this binary string represents whether or not to take an item with you. A value of 1 refers to putting the specific item in the knapsack while a 0 refers to leave the item at home. Given the example gene configuration we would take the following items;

We can check to what amount of surivival points this configuration sums up. This will gave a value to the gene configuration of a given chromosome. This is exactly what the evaluation function does.

chromosome = c(1, 0, 0, 1, 1, 0, 0)
dataset[chromosome == 1, ]
##           item survivalpoints weight
## 1  pocketknife             10      1
## 4       unions              2      1
## 5 sleeping bag             30      7
cat(chromosome %*% dataset$survivalpoints)
## 42

Funciton for our Knapsack problem.

In GAs, we have a pool or a population of possible solutions to the given problem. These solutions then undergo recombination and mutation (like in natural genetics), producing new children, and the process is repeated over various generations. Each individual (or candidate solution) is assigned a fitness value (based on its objective function value) and the fitter individuals are given a higher chance to mate and yield more “fitter” individuals. This is in line with the Darwinian Theory of “Survival of the Fittest”.

The genalg algorithm tries to optimize towards the minimum value. Therefore, the value is calculated as above and multiplied with -1. A configuration which leads to exceeding the weight constraint returns a value of 0 (a higher value can also be given).

evalFunc <- function(x) {
    current_solution_survivalpoints <- x %*% dataset$survivalpoints
    current_solution_weight <- x %*% dataset$weight

    if (current_solution_weight > weightlimit) 
        return(0) else return(-current_solution_survivalpoints)
}

Run the Model to find the available

Next, we choose the number of iterations, design and run the model.

iter = 100
GAmodel <- rbga.bin(size = 7, popSize = 200, iters = iter, mutationChance = 0.01,  elitism = T, evalFunc = evalFunc)
plot(GAmodel)

plot(GAmodel,type = "hist")

Finding Best Solution

It means Best Solution : 1 1 0 1 1 1 1

The best solution is found to be 1111101. This leads us to take the following items with us on our trip into the wild.

solution = c(1, 1,0, 1, 1, 1, 1)
dataset[solution == 1, ] 
##           item survivalpoints weight
## 1  pocketknife             10      1
## 2        beans             20      5
## 4       unions              2      1
## 5 sleeping bag             30      7
## 6         rope             10      5
## 7      compass             30      1

This in turn gives us the total number of survival points.

solution vs available

cat(paste(solution %*% dataset$survivalpoints, "/", sum(dataset$survivalpoints)))
## 102 / 117

Solving Difficult Problems

In computer science, there is a large set of problems, which are NP-Hard. What this essentially means is that, even the most powerful computing systems take a very long time (even years!) to solve that problem. In such a scenario, GAs prove to be an efficient tool to provide usable near-optimal solutions in a short amount of time.