Genetic algorithm components

Navigation:  Geocomputational methods and modeling > Genetic Algorithms and Evolutionary Computing >

Genetic algorithm components

Previous pageReturn to chapter overviewNext page

In the following subsections we provide details of the main components that make up typical genetic algorithms. For each component examples of the various optional settings for these components are provided, many of which are provided within general-purpose GA software toolsets such as GALib, from Matt Wall at MIT, and the commercial MATLAB Genetic Algorithm and Direct Search Toolbox.

Encoding or representation

GA’s require representation of possible solutions as chromosome or genome-like objects. The most widely used representations are binary strings and lists, although other representations such as trees and multi-dimensional arrays are possible. A single chromosome or genome consists of separate elements or genes., each of which contains a value (or allele) which may be binary, integer, real-valued or even symbolic. A typical binary coded chromosome (the gray-shaded string below) would be of the form:

A

B

C

D

E

F

G

1

0

0

1

1

1

0

where A, B, C… represent separate elements of the problem and the values ‘0’ and ‘1’ the presence or absence of these elements. For example, Fischer and Leung (1998) used a 68-bit string to encode a multi-layer neural network spatial interaction model, in which individual bits were assigned meanings such as: bits 1,8 and 15 = presence/absence of hidden layers 1, 2 and 3 respectively; bits 2-7 = the number of nodes in the first hidden layer (hence<64); etc.

Typical list representations might be of the form:

1

5

3

3

11

2

7

where each gene corresponds to a separate data item and the values (or alleles) 1,5,3,3… represent the assignments of individual data items to separate classes or clusters; or

103

12

33

90

67

201

11

where again each gene corresponds to a separate data item and 103,12,33… represent index values of these data items (e.g. identifying a set of discrete locations, such as cities or candidate facilities). Note that these last two representations require the use of very long strings as every element (gene) of the string (chromosome, genome) is a unique item from the dataset.

Fitness function

Specification of a suitable fitness function is a key element in the design of a successful GA. It measures how good a single individual is in terms of meeting the problem objective, e.g. minimization of total travel time. Hence for many geospatial problems the fitness function is simply based on the objective function of the problem, or some simplified or surrogate version of this that provides a suitable comparative measure and can be computed without excessive processor overhead. The result of the fitness function is typically a real-valued number (fitness value), with higher values assigned to better performing individuals. These real values are generally transformed, often twice, to create a standardized score that is then used in the selection of parents for reproduction.

Fitness is typically computed for all members of the population, at each generation of the procedure, and the fitness values for all members are then ranked (first transform) such that the fittest=Rank 1 and the least fit=Rank P, where P is the population size. This ranking (fitness rank) is then used to guide selection of parents for reproduction, with higher ranking parents being selected preferentially. This can be achieved by randomly selecting from the top N or top x% of the ranked individuals, but more commonly some form of rank scaling is performed (second transform) and selection takes place from this modified dataset. Typically rank scaling assigns a fitness score to each chromosome based on its rank, r, such that better rankings (1,2,3,…) have disproportionately higher fitness scores. A standard transform is of the form:

For example, with N=P=40, c is chosen such that the sum of the transformed values = N, in this case c=3.55. A graph of the transformed ranks is shown in Figure 8‑32. Other forms of scaling can be applied, such as simple inverse rank fitness scaling (e.g. of the form sr=k/r, where k is a constant). These transformed values then have an effect on the subsequent selection procedures that can be used.

Figure 8‑32 Rank score transform

clip0432

Population initialization

Most GAs are initialized with a substantial number of chromosomes that represent alternative solutions to the problem being analyzed. The size of this population, P, is typically set between 100 and several 1000s, depending on the nature of the problem, the size of the solution space and the need to provide a thorough coverage of this space. Obviously, the larger this population the greater the amount of processing required. For multivariate optimization problems the population size should be significantly greater than the number of variables in the problem.

Typical initializations of population strings include simple uniform random, order-based random (random permutations), and allele-based random (i.e. randomized values). The initialization chosen will depend upon the problem and its encoding. Intelligent initialization is possible, whereby the solution space has been subjected to prior statistical analysis which is used to guide the initialization towards ‘good’ starting points.

Selection

Genetic algorithms select members of the current generation to act as parents for the next generation. Selection is biased according to the fitness of the parents, but selection of specific pairs of individuals for use in the creation of new children remains to be determined. The approach to selection depends to some extent on the way in which fitness values have been scaled. Note that the same parent may be selected multiple times, but selection of the same pair of parents may be disallowed.

There are several commonly used selection procedures:

Roulette: this selection procedure treats the fitness scores as if they were parts of a pie diagram, producing under random selection a form of roulette wheel that is biased in favor of the largest segments of the wheel (i.e. the fittest individuals)
Tournament: m>2 prospective parents are chosen at random from the population, P, and the fittest two are selected
Uniform random: the fitness score of parents determines their probability of being selected ― often achieved by an initial selection based on random selection proportionate to the fitness score for parent 1, with a fixed offset to select parent 2 (e.g. regarding all parents as laid out along a line whose segment lengths reflect their scores)

Reproduction

Reproduction is the process of creating a new set of children using pairs of parents selected from the current generation. In principle all parents could be candidates for selection. However, frequently only a proportion is selected for reproduction (crossover), typically 70%-80%. In addition a number of the fittest parents may be retained to the next generation, e.g. an elite set, say of the best 2,3 or more. It has been found, in practice, that setting a crossover percentage to 100% does not yield good solutions in most cases, and likewise reducing the percentage too much (e.g. below 20%) is also ineffective.

Crossover

Crossover is the process by which genetic material from one parent is combined with genetic material from the second parent to produce new offspring. The simplest form of crossover is called single-point. If the chromosome length is n, say, a random number between 2 and n is generated and the chromosome of each parent is effectively split at this point. For example, if A and B are the parents

A = [a b c d e f g h]     B = [1 2 3 4 5 6 7 8]

and the crossover point is 3, the following child is generated:

child 1 = [a b c 4 5 6 7 8]

A second child may also be generated, as follows:

child 2 = [1 2 3 d e f g h]

With two-point crossover two random numbers are generated, in a similar manner to that described above. Using the same example, with crossover points 3 and 6, we have:

child 1 = [a b c 4 5 6 g h]

A second child may also be generated, as follows:

child 2 = [1 2 3 d e f 7 8]

So-called uniform crossover utilizes a crossover template, a random binary vector that is the same length as the parent chromosomes. This template is used to select which genes from each parent should be crossed over. For example, if bit 1 in the template is 1, then gene 1 of parent 1 is chosen to create the first gene of the child. If bit 2 is 0 then gene 2 of parent 2 is chosen. A second child can then be generated by flipping the bit rule. With the template string: [1 1 0 0 1 0 0 0] the following children could be generated:

child 1 = [a b 3 4 e 6 7 8]

child 2 = [1 2 c d 5 f g h]

Other crossover options are possible, some of which are specific to the representation used (binary, list, tree etc.). For example, GALib supports the following crossover operators: arithmetic, blend, partial match, ordered, cycle, single point, two point, even, odd, uniform, node- and subtree-single point (see the GALib documentation for an explanation of these variants).

Heuristic (customized) crossover models may also be used, where knowledge of the problem representation and solution space can be used to ensure validly represented and hopefully fitter children are produced. For example, in many geospatial problems crossovers must be designed and checked to ensure replication of genes does not occur (e.g. a candidate location may only appear once in the list). Likewise, the spatial nature of the problem may suggest that certain types of crossover are preferable to others, for example locations that are geographically close to each other may be relevant in determining good (i.e. effective) crossover operations.

Mutation

Mutation is a process that randomly alters the population chromosomes in a manner that ensures greater genetic diversity and results in a broader search space than would otherwise occur. As with crossover, mutation may be achieved in many ways, some of which are representation dependent. The simplest form of mutation applies to vector representations, where a random gene is selected within each vector and with probability of, say 1%, is altered by flipping the bit (binary representations), or by swapping the allele value to a random number in the valid range for that allele. A percentage of genes may be selected for possible mutation, rather than just one. Furthermore, with some mutation models the nature and scale of mutation is governed by a statistical process (e.g. Gaussian) and has a decreasingly probability as the GA is run. With tree representations mutation may involve the swapping of tree nodes or entire sub-trees.

Local search

Solutions found by GAs are represented by the fittest member or members of the evolved population. Some problems may have multiple solutions with equal optimal values, whilst for others multiple solutions with comparable (not necessarily optimal) fitness may be desirable. However, GAs do not guarantee that optimal solutions will be found if they exist, nor do they provide an indication of how close they get to theoretical optimal solutions. They are generally sub-optimal, and sometimes, relatively poor or mediocre in their results. One way of improving this performance, often dramatically, is to combine the GA process with some form of local search algorithm, e.g. a greedy search or interchange search algorithm. This may be applied to all, or more typically, a subset of solutions, modifying the offspring to generate children that have significantly improved fitness, and which then replace less fit members of the population.

Termination

In many implementations termination of the GA occurs when a pre-specific number of iterations or generations have passed. Other criteria include convergence (e.g. of the fittest individual to the mean value in the population) or to a stable (non-improving) fitness value, or some combination of these options.