Genetic algorithms (GAs) are computationally intensive global search heuristics, or metaheuristics. They have very wide applicability, but their implementation is often highly problem specific and there is only a relatively loose understanding as to why they often work rather well. This Section provides an initial outline of GAs and how they can be applied in spatial analysis. Section 8.4.2, Genetic algorithm components, provides details of the main components of GAs, whilst Section 8.4.3, Example GA applications, provides example geospatial applications and specifications for the various GA components used in these instances.

The central idea behind GAs is to mimic the Darwinian notion that selective breeding seeks optimum individuals in a given environment. In order to do this a GA requires a way of representing a solution to a (spatial) problem as if it were an individual viewed as a chromosome or ‘genome’. Each individual (chromosome) is typically modeled as a string of codes (genes) in a particular order (see further, below). GAs then apply a fitness function which is used to determine which members of the population of solutions are retained and which are discarded. The fittest members are selected and a new generation is created through the genetic operations of crossover (e.g. splitting each chromosome into two parts or extracting a fixed length segment and recombining each part with the other) and mutation (e.g. simple interchange of genes within a chromosome). This gives rise to two offspring (new potential members of the solution set) whose fitness is then evaluated. The least fit member or members of the original population are then replaced with the fittest offspring and the process is iterated for a large number of generations or until some other stopping criterion is reached. A simple web-based example of a genetic algorithm in action is provided via the NetLogo models page.

Although GAs often use binary strings to represent solutions, where each bit in the string corresponds to a particular element of the solution, many representations are possible. For the traveling salesman problem (TSP) for example, with n cities, a string might be used as the representation, where each element of the string contains the identity (e.g. a number, 103, 12 etc.) of each one of the cities, as shown below (note this could result in very long strings, which is not particularly efficient):

103 |
12 |
33 |
90 |
67 |
201 |
11 |
… |
… |

A population of P<<n! randomly permutated strings can then be generated as the starting point for the population (set of prospective solutions). Each of these strings has a cost or length, d, and the fitness of a string can be computed as 1/d, for example ― hence the fittest strings (highest fitness values) are the shortest tours. All that remains is to run the algorithm until a stopping criterion is reached.

An outline GA procedure for solving the TSP is thus:

• | Create a set of P randomly permuted strings representing potential solution tours ― P is often very large (e.g. 100s or 1000s), in order to ensure that the solution space is well represented from the start |

• | Evaluate the fitness of each tour (i.e. the 1/d function) and sort these by fitness |

• | Select random pairs of solutions from all or a percentage of the population with a probability based on their fitness. For each selected pair perform a crossover combination operation. With binary strings this is simple, but with lists of cities it is more complicated since duplication of cities is not permitted, so additional checks are required (e.g. removing cities in chromosome 1 that appear in a fixed length segment of chromosome 2 and vice versa, and then appending these segments onto each other’s chromosomes). Having performed this operation, mutate one or more of the genes of these offspring with a very low probability (e.g. randomly change gene 24, say, from a 1 to a 0, or randomly select two genes and swap them) |

• | Evaluate the fitness of the offspring |

• | Replace all or the least fit members of the population with the new generation (retaining the best performing individual or individuals from generation to generation is known as elitist selection. and often improves overall performance of the GA) |

• | Iterate the above steps until a specified stopping criterion has been reached |

The effect of the steps described is to gradually increase the average fitness of the population, such that the fittest members (hopefully) approximate the global optimum solution. Unfortunately this may not be the case, as GAs can often converge to local optima and then stubbornly refuse to improve. Furthermore, GAs are not particularly efficient in terms of computational resources (memory, time), especially when compared with other metaheuristics such as simulated annealing. They do have the advantage of considerable flexibility, however, which combined with their overall performance can make them a very attractive option.

By systematically modifying the parameters of the algorithm (population size, mutation rate, cross-over rate and type, fitness function) significant improvements may be made. It is often also useful to combine a GA with other optimization techniques, especially local search, in order to improve the results. This is in part because GAs tend to under-perform when compared with problem-specific algorithms, but their generality often finds good initial solutions, that can then be systematically improved.

Genetic algorithms have been applied increasingly in geospatial analysis, particularly addressing demanding problems, including clustering, zone design/re-districting, model-building, map labeling, map feature simplification, constrained optimum location problems, and many more. Dijk et al. (2002) provide a fuller discussion, some observations from which we examine in greater detail below (Sections 8.4.2 Genetic algorithm components and 8.4.3,Example GA applications ).