Genetic Algorithm (GA) is a search-based optimization technique based on the principles of Genetics and Natural Selection. It is frequently used to find optimal or near-optimal solutions to difficult problems which otherwise would take a lifetime to solve. It is frequently used to solve optimization problems, in research, and in machine learning.
GA generates a population, the individuals in this population (often called chromosomes) have a given state. Once the population is generated, the state of these individuals is evaluated and graded on their value. The best individuals are then taken and crossed-over - in order to hopefully generate ‘better’ offspring - to form the new population. In some cases the best individuals in the population are preserved in order to guarantee ‘good individuals’ in the new generation (this is called elitism).
To explain the example I will use my version of the Knapsack problem.
You are going to spend a month in the wilderness. You’re taking a backpack with you, however, the maximum weight it can carry is 20 kilograms. You have a number of survival items available, each with its own number of “survival points”. You’re objective is to maximize the number of survival points.
The following table shows the items you can choose from.
ITEM SURVIVALPOINTS WEIGHT pocketknife 10.00 1.00 beans 20.00 5.00 potatoes 15.00 10.00 unions 2.00 1.00 sleeping bag 30.00 7.00 rope 10.00 5.00 compass 30.00 1.00
library(genalg)
library(ggplot2)
dataset <- data.frame(item = c("pocketknife", "beans", "potatoes", "unions",
"sleeping bag", "rope", "compass"), survivalpoints = c(10, 20, 15, 2, 30,
10, 30), weight = c(1, 5, 10, 1, 7, 5, 1))
weightlimit <- 20
dataset
## item survivalpoints weight
## 1 pocketknife 10 1
## 2 beans 20 5
## 3 potatoes 15 10
## 4 unions 2 1
## 5 sleeping bag 30 7
## 6 rope 10 5
## 7 compass 30 1
Before creating the model we have to set-up an evaluation function. The evaluation function will evaluate the different individuals (chromosomes) of the population on the value of their gene configuration.
An individual can for example have the following gene configuration: 1001100.
Each number in this binary string represents whether or not to take an item with you. A value of 1 refers to putting the specific item in the knapsack while a 0 refers to leave the item at home. Given the example gene configuration we would take the following items;
We can check to what amount of surivival points this configuration sums up. This will gave a value to the gene configuration of a given chromosome. This is exactly what the evaluation function does.
chromosome = c(1, 0, 0, 1, 1, 0, 0)
dataset[chromosome == 1, ]
## item survivalpoints weight
## 1 pocketknife 10 1
## 4 unions 2 1
## 5 sleeping bag 30 7
cat(chromosome %*% dataset$survivalpoints)
## 42
In GAs, we have a pool or a population of possible solutions to the given problem. These solutions then undergo recombination and mutation (like in natural genetics), producing new children, and the process is repeated over various generations. Each individual (or candidate solution) is assigned a fitness value (based on its objective function value) and the fitter individuals are given a higher chance to mate and yield more “fitter” individuals. This is in line with the Darwinian Theory of “Survival of the Fittest”.
The genalg algorithm tries to optimize towards the minimum value. Therefore, the value is calculated as above and multiplied with -1. A configuration which leads to exceeding the weight constraint returns a value of 0 (a higher value can also be given).
evalFunc <- function(x) {
current_solution_survivalpoints <- x %*% dataset$survivalpoints
current_solution_weight <- x %*% dataset$weight
if (current_solution_weight > weightlimit)
return(0) else return(-current_solution_survivalpoints)
}
Next, we choose the number of iterations, design and run the model.
iter = 100
GAmodel <- rbga.bin(size = 7, popSize = 200, iters = iter, mutationChance = 0.01, elitism = T, evalFunc = evalFunc)
plot(GAmodel)
plot(GAmodel,type = "hist")
It means Best Solution : 1 1 0 1 1 1 1
The best solution is found to be 1111101. This leads us to take the following items with us on our trip into the wild.
solution = c(1, 1,0, 1, 1, 1, 1)
dataset[solution == 1, ]
## item survivalpoints weight
## 1 pocketknife 10 1
## 2 beans 20 5
## 4 unions 2 1
## 5 sleeping bag 30 7
## 6 rope 10 5
## 7 compass 30 1
This in turn gives us the total number of survival points.
cat(paste(solution %*% dataset$survivalpoints, "/", sum(dataset$survivalpoints)))
## 102 / 117
Solving Difficult Problems
In computer science, there is a large set of problems, which are NP-Hard. What this essentially means is that, even the most powerful computing systems take a very long time (even years!) to solve that problem. In such a scenario, GAs prove to be an efficient tool to provide usable near-optimal solutions in a short amount of time.