<< Click to Display Table of Contents >>
## Heuristic and meta-heuristic algorithms |

The term heuristic algorithm refers to the use of systematic procedures that typically trade off the quality of a solution for processing speed. Many heuristics are derived from relatively simple, commonsense ideas for finding solutions to problems, but which will not necessarily produce an optimum solution. In some instances heuristics do produce provably optimal solutions. Heuristics are defined by a set of permitted moves or transformations of a current solution, Q say, to a new (generally improved) solution, Q*. The space of all possible solutions for a given problem is known as the search space, S, and the space of solutions that are obtainable by a single transformation of a current solution, Q, is known as the neighborhood space, N. A comparison of a number of alternative heuristics applied to a facility location problem is provided in Section 7.4.2, Larger p-median and p-center problems.

Heuristic algorithms will often provide a fast ‘upper bound’ or ‘lower bound’ to the value of some objective function, such as the average travel time from a set of demand locations to a set of supply locations or facilities. The ‘upper bound’ can provide one edge of a confidence band whose overall width can be determined if a procedure can be devised that provides a matching ‘lower bound’ for the problem at hand. The true upper bound and lower bound is the optimal solution itself — however, this value is generally unknown in advance and may be impossible to determine. An alternative approach is to solve a problem that is very similar to the problem at hand, often with fewer constraints on the permissible solutions, and for which a solution can be obtained rapidly. This is generally known as solving a ‘relaxed’ version of the original problem. A similar concept is to solve a set of sub-problems that collectively include all of the original dataset elements.

Once an upper and lower bound have been obtained, procedures can be devised that attempt to reduce the upper bound and raise the lower bound until they are arbitrarily close — if they meet the global optimum will have been obtained, and if not then at least the quality of the solution in terms of proximity to the optimum is known — a solution procedure that gives results that are within 5% of the optimum and can be obtained very rapidly is often preferable to a solution procedure that is optimal but which in practice may not do this using finite computer and time resources. An example of a relaxed problem might be to permit the use of Manhattan or minimax distance rather than Euclidean or network distance when assigning customers to facilities, or to entirely or partially remove one or more constraints on the problem as a temporary measure. A widely used procedure of this kind is known as Lagrangian relaxation (see further, Section 7.2.2, Heuristic and meta-heuristic algorithms).

Tests of combinatorial optimization procedures in general and heuristics in particular, often use the traveling salesman problem (TSP) as a benchmark for the quality and performance. This is because the problem is easily specified, known to be extremely difficult to solve, and may be examined using a library of available test data. In essence the TSP involves making a tour of every vertex in a given set (e.g. cities in a country, nodes on a VLSI circuit board), and returning to the starting point, such that the total tour length or cost is minimized. The distance measure used must be a metric or semi-metric, i.e. asymmetry may be permitted but distances must be positive and satisfy the triangle inequality. If there are N cities there are N! possible tours, so the search space is O(N!), a value which rapidly becomes immensely large (this reduce slightly to N!/2 if the distances are symmetric and (N-1)!/2 if the position of the first city is pre-defined). More details on the TSP are provided in Section 7.3.5, Tours, traveling salesman problems and vehicle routing, but it is referred to frequently in other sections of this Guide. An excellent introduction to TSP problems and solutions, is provided by the Wikipedia entry

en.wikipedia.org/wiki/Travelling_salesman_problem

Although the basic symmetric Euclidean TSP, as described above, is the most common baseline used for testing, many variants exist, including: asymmetric edges; time windows for deliveries/collections; multiple routes (partitioned sets of cities); capacity constrained links and vehicles; and many more. Their application is also not restricted to transportation problems, but extends to many other fields of research, from printed circuit board manufacture to job scheduling to genetic sequencing. See the TSPLIB website for more details.

Greedy heuristics and local search

So-called greedy heuristics involve a process whereby at each stage a locally optimum choice is made which may or may not lead to a globally optimum solution to some problem. As such, greedy algorithms are local search, or LS, procedures.

For example, suppose one has a set of points or vertices {V} in the plane and you wish to create a network of edges {E} such that every point is accessible via the network from every other point, and overall the total (Euclidean) length of the network is minimized. A greedy algorithm that solves this problem (the so-called Minimum Spanning Tree, or MST, problem) is as follows:

•Select any point {x} from V at random to start. Define the set V*={x} and the set E*={ }

i.e. the set V* is initialized with a single point chosen at random from the source vertices, and the set E* is initialized as an empty set of edges. Then,

•Find the nearest point in the set V not in V* (v say) to a point in the set V* (u say) and add this to the set V* together with the edge connecting v and u to E*. If two or more points are equidistant from V* then choose one of these at random

i.e. this step ensures that at each iteration the shortest/lowest cost edge joining a point in V* to a point not yet in V* is added to the solution set and will always form a connected tree. Then,

•Repeat the preceding step until V*=V. The set E* contains the MST

This is known as Prim’s algorithm (Prim, 1957) and is exact in the sense that the solution it finds is globally optimal. It is also relatively fast. Kruskal’s greedy algorithm for determining the MST of a weighted graph is similar in concept, but starts with the vertex set V already connected such that every vertex has a direct or indirect link to every other vertex. The shortest (least weighted) link is then identified and removed, being added to the solution set. The process continues, checking that loops in the network do not occur, until the vertices in the solution set form a complete tree, which is the MST. Both of these algorithms are effectively constructive in form, gradually building the solution. Some heuristics are essentially aimed at improvement rather than construction, for example by commencing with a random or best known solution and then seeking to systematically improve upon this by a well-defined procedure (e.g. local search or iterative improvement).

Most greedy algorithms are fast but, unlike those described above, yield sub-optimal solutions and in some cases the provably worst possible solution! This can be the case for a similar algorithm to that described above, but where the objective is to define the shortest possible tour of the set V with each vertex being visited only once, returning finally to the starting vertex (the basic Traveling Salesman Problem).

There are many variants to greedy algorithms, some of which are problem-dependent whilst others attempt to find a more systematic approach to improving the local optimum found by the procedure. For example, a greedy algorithm for a facility location problem could be to choose a set of N solution locations at random and then assign all customers to the nearest facility — e.g. retail stores and customers, children and schools, medical facilities and patients. This generates a very poor solution if one is trying to minimize the average or maximum travel time by customers to facilities. An improvement to this greedy random algorithm might be a greedy add variant, whereby a single initial facility is selected at random and all customers assigned to it, and then new facilities are added one by one at random locations but subject to some additional rule, with customers re-assigned according to the ‘nearest facility’ rule. This may yield an improved solution. Repeated runs of a greedy procedure may also yield a set of results from which the best in terms of the criteria to be optimized is selected. Indeed, characteristics identified from the best solutions may assist in a progressive improvement process. So-called GRASP heuristics (greedy randomized adaptive search procedures) start with a feasible solution from a more conventional greedy algorithm and then apply a local search algorithm to try and improve on this solution. Repeated runs of the GRASP procedure give rise to an overall best solution.

Interchange heuristics

Heuristics that commence with one solution to a problem (typically a combinatorial optimization problem) and then systematically swap members of this initial or current solution with members that either form part of another element of this current solution or belong to the set “not a member” are called interchange heuristics. There are many examples of such heuristics. For example, the automatic zoning procedures (AZP) described in Section 4.2.11, Districting and re-districting, are an example of such an interchange procedure, as is Jenk’s natural breaks method for classifying univariate data (see Figure 4‑22).

One of the best known interchange heuristics is the n-opt family applied to the standard form of the traveling salesman problem (TSP) with Euclidean distance metric. This is a simple improvement algorithm that applies to an existing symmetric tour of all the locations. The procedure simply takes two edges in the solution at random, say (i,j) and (k,l) and replaces them with (i,k) and (j,l) or (i,l) and (j,k). There are several improvements to this scheme which make a significant difference to its performance. These include checking that the revised tour contains no crossings, as this will never be the shortest configuration and only opting for interchanges to a candidate list of swaps that will yield the greatest benefit. 3-opt interchange is essentially the same as 2-opt but taking three edges at a time. This can be more effective and is essential for asymmetric problems, but at a higher computational cost.

In the field of location modeling, a common class of problems involves identifying potential facility locations and then allocating customers to those locations. The objective is generally to minimize the cost of providing services from p facilities to m customers. A series of algorithms have been developed to solve such problems. One of the best known in geospatial analysis is a vertex substitution algorithm due to Teitz and Bart (1968) — see further, Section 7.4.2, Larger p-median and p-center problems. The solution process systematically evaluates marginal changes to a given set of facility locations, as follows:

•An initial facility configuration is supplied to the algorithm (e.g. a random selection of p locations from a candidate set of n>p) — this provides the first "current solution"

•The first candidate not in the current solution is substituted for each facility site in the current solution and the customers are re-allocated based on this new configuration of facilities. The substitution yielding the largest decrease in the objective function, if any, is selected for a swap

•When all of the candidates not in the current solution have been substituted for all of the sites in the current solution, an iteration is complete and the process can then be repeated

The algorithm terminates when a single iteration does not result in a swap. At termination, the interchange heuristic generates facility configurations that meet all three necessary, but not sufficient, conditions for an optimum solution to the problem: all facilities are local medians (least travel cost/distance centers) for the demand points allocated to them; all demand points are allocated to their closest facility; and finally, removing a facility from the solution and replacing it with a candidate site not in the solution always incurs a net increase or no change in the value of the objective function. Note that this solution will not in general be the global optimum, is not necessarily unique, nor is there any immediate way of identifying how good the solution found really is (i.e. how close it is to the global optimum). As with other heuristics, there are many possible variants and extensions of the core interchange procedure.

Metaheuristics

The term metaheuristic was originally coined by Glover (1986). It is now is used to refer to developments of heuristic concepts that go beyond local search (LS) procedures as a means of seeking a global optimum, typically by analogy with natural (physical or biological) processes. Examples of metaheuristics include tabu search, simulated annealing, ant systems (see below) and genetic algorithms (see Section 8.4, Genetic Algorithms and Evolutionary Computing). A series of bi-annual international conferences on metaheuristics have been held in various locations in recent years.

The analogy of many of these procedures with biological systems is often slightly tenuous, drawing ideas rather than detail from the way ants search for food or animal genetics help to create fitter offspring. Furthermore, in many instances the techniques are applied to static problems, whereas biological systems operate in a dynamic environment, for which sub-optimal behavior that has in-built robustness and flexibility is often more important than temporary optimization — finding and eating all one’s prey in the shortest possible time may exhaust their population to the point they cannot reproduce and provide you with yet more food! This observation provides not only a note of caution about such analogy-based procedures, but also one of optimism that they may prove particularly useful in dynamical systems — indeed this has been found to be the case in fields such as dynamic telecommunications routing and real-time traffic management.

Tabu search

Tabu search is a metaheuristic that aims to overcome the problem of a local search (LS) procedure (e.g. a greedy heuristic) becoming trapped at a local optimum. As such it is a general purpose extension to LS procedures that operates by permitting non-improving moves whenever a local optimum is encountered. It achieves this by recording the recent history of searches in a tabu list (a sort of short-term memory) and ensuring that future moves do not search this part of the search space, at least not for a specified number of iterations. This greatly reduces the likelihood of changing a solution configuration in a manner that will simply cycle back to the current solution. See Glover and Laguna (1997) for a full discussion of the method and variants.

Tabu search methods are thus defined by the search space, the pattern of local moves (the neighborhood structure), and the use of search memory:

•The search space, S, is simply the space (or set for purely combinatorial problems) of all possible solutions to a given problem. Note that this may be very large, or for some problems (e.g. those involving a mixture of discrete and continuous variables to be optimized) innumerably large. The search space can include infeasible as well as feasible solutions, and in some cases permitting the search to extend into infeasible regions is essential (e.g. examining possible solutions for which constraints have been relaxed)

•The neighborhood structure determines the set of moves, or transformations, of the current search space S effected by a single iteration of the process. Hence the neighborhood, N, is a (typically small) subset of the search space: N⊂S. A simple example of such transformations is the set of interchange heuristics, where one or more elements from the current solution are interchanged with one or more elements from other parts of the current solution and/or elements that are currently outside the components of the current solution

•Search memory, in particular short-term search memory, is the characteristic that distinguishes tabu search from most other methods. A typical example would be the retention of a list of recent moves, whose reversal is tabu for a number of iterations (also known as the tabu tenure). With a network routing problem, in which customer A has just been moved from route 1 to route 2, one would prevent reversal of this interchange for a short period, in order to avoid the procedure cycling without improvement. The risk with this approach is that sometimes such moves are attractive and effective, and the process can be improved by relaxing strict tabus. The typical relaxation permitted (known as applying aspiration criteria) is to allow a tabu move if it results in a solution with an objective value that is an improvement on the best known value to date

Despite these protections, tabu search may still under-perform, either in terms of efficiency or quality of result. Various techniques have been devised to improve its performance, many of which are problem-specific in their design. These include: probabilistic selection of samples from the space N, in order to reduce the processing overhead and introduce randomness that again reduces the risk of encountering cycles; intensification, in which a number of components of the current solution (e.g. entire routes or assignments) are held fixed whilst other elements are allowed to continue being modified; and diversification, whereby components of the current solution that have appeared frequently or continuously since the start of the iteration process are systematically removed from the solution in order to give unused or rarely used components the chance of generating an overall improvement; surrogate objective functions can also improve performance (though not directly quality) of a solution by reducing the overhead that is sometimes present in computing the value of the objective function. If the surrogate function is highly correlated with the objective function and is very simple to compute, it can enable many more operations to be conducted within a given time period, hence expanding the range of solutions examined; such hybridization is becoming an increasingly common practice, whereby techniques like tabu search are hybridized with other solution approaches such as genetic algorithms.

Cross-entropy (CE) methods

The CE method is an iterative procedure that can be applied to a wide range of problems, including shortest path and traveling salesman problems (TSPs). It involves two phases:

•Generation of a sample of random data (trajectories, vectors, etc.) according to a specified random mechanism (typically a Monte Carlo process), and

•Updating the parameters of the random mechanism, on the basis of the data, in order to produce a "better" sample in the next iteration

The updating mechanism makes use of the discrete version of the cross-entropy statistic. In its basic form this statistic compares two probability distributions or one probability distribution and a reference distribution. Tests by the authors of this procedure (see www.cemethod.org) on TSP problems have shown that it is capable of providing good solutions in reasonable time. However, it does not out-perform the best TSP procedures currently available (see further, Section 7.3.5, Shortest (network) path problems), and unmodified does not take advantage of problem-specific enhancements that would improve its performance.

Simulated annealing

Simulated annealing is a metaheuristic devised by Kirkpatrick et al. (1983) and derives its name and approach from the behavior of metals and glass as they are systematically heated and re-heated and then allowed to cool steadily. The objective, as with other metaheuristics, is to obtain a close approximation to the global optimum for a given problem. The original paper by Kirkpatrick et al. proved that the approach would converge to the global optimum, given enough time, which encouraged considerable investigation into this, and a range of other metaheuristics for solving particularly difficult combinatorial problems. Simulated annealing can be viewed as a form of managed random walk through the solution space, S, in which exploration of the neighborhood space is determined by appeal to annealing behavior, which in turn relates to the temperature of the process over time. The procedure is essentially as follows:

•Define the initial configuration for the problem, e.g. a random solution set, S0*, and an initial temperature variable, T, and evaluate the cost, C0*, (e.g. the total distance or travel time) for this solution

•Perturb S0* to a new neighboring state, S1*, e.g. by moving a potential facility location by some random coordinate step or by an interchange process. Compute the cost of this new state, C1* and subtract C0*, to give the cost difference ΔC

•If ΔC<0 then the new configuration has lower overall cost, so select the new configuration as the preferred current configuration. However, if the cost is higher, still retain the option of keeping the new configuration according to the so-called Metropolis criterion:

if p<u where u is a uniform random number in the range [0,1]. If the criterion applies then the temperature variable, T is reduced by a factor, α say (e.g. α=0.9) and the process iterates from the previous step until some stopping criterion is reached (e.g. number of iterations, absolute or relative improvement of the objective function)

In a sense the temperature parameter controls the way in which the search space is traversed, initially in large steps and then, as the temperature reduces (the annealing schedule), in smaller and smaller steps until T=0 or another stopping criterion has been reached.

Simulated annealing is often a relatively slow technique, so modifications are often applied that are problem-specific and the result of analysis of the statistical behavior of the model, with greatly improvement performance being possible. Such changes, however, may remove the guarantee of ultimate global optimization. Significant advantages of simulated annealing include the simplicity of the basic algorithm, the low memory overhead of the procedure, and the range of optimization problems (geospatial and other) to which the approach may be applied. In the geospatial arena the procedure has been successfully applied to a variety of problems, including optimum facility location (e.g. see Muttiah et al., 1996) and traveling salesman problems.

Lagrangian multipliers and Lagrangian relaxation

Lagrangian relaxation is a generalization of the use of Lagrangian multipliers in classical optimization problems, so we commence with a discussion of such problems before introducing the notion of relaxation.

The so-called “classical constrained optimization problem” applies to real-valued continuous differentiable functions (known as class C1 functions), f(), g() and is of the form:

or, using vector notation:

Here the expression for z is known as the objective function, the expressions gi() are known as the constraints, the xi are variables and the di are constants. The constraints in this case are shown as equalities — inclusion of inequalities, which is common, can be handled in a variety of ways, including the use of surrogate variables.

This class of problems can be converted into an unconstrained optimization problem by the use of Lagrangian multipliers. This process often simplifies the task of finding local and global maxima or minima. In the example above, all of the constraints can be moved into the objective function, and we have:

The λi components of the expression above are known as the Lagrangian multipliers. Solving the constrained problem then becomes one of maximizing or minimizing a single expression, which in turn requires determination of the introduced multipliers. These may be obtained by finding the differential of the modified objective function with respect to each xi and equating the result to zero (see Walsh, 1975, for a fuller treatment of the use of Lagrangian multipliers in such problems). Although interpretation of the Lagrangian multiplier values is not a central part of this process, in many instances they can be interpreted as a measure of the importance of the ith constraint.

As noted earlier, Lagrangian relaxation is a generalization of the above procedure. The idea is to relax the constraints by amending the objective function. The relaxed problem is then solved (if possible) and this can provide a lower (or upper) bound to the set of possible solutions to the original problem. The resultant smaller solution space can then be subjected to more systematic search or attempts can be made to narrow the upper and lower bounds until they meet.

For example, suppose that we are attempting to maximize the simple linear expression:

Additional conditions may be that all the xj must be greater than or equal to 0, and for problems with discrete solutions, that the solution values must be integers.

We can relax this problem whilst retaining the linear form of the objective function, by moving the constraints into the objective function whilst dropping the inequality, as follows:

If all λi=0 the second term in the objective function disappears and thus the constraints play no part in the solution, but if any λi>0 the value of the objective function we are trying to optimize is penalized (increased or decreased) when we break the constraints. By adjusting the degree to which the original problem is relaxed, varying the positive vector λ (the Lagrangian) in a systematic manner, and/or by alternative search procedures to the reduced solution space, closer approximations to the original problem can be obtained. Lagrangian relaxation is discussed in some detail in Section 7.4.2, Larger p-median and p-center problems, which deals with its application to optimum location problems.

The preceding discussion becomes clearer if we take a simplified example. For the purposes of this subsection we use an example provided in the excellent introductory text on combinatorial programming and spatial analysis by Scott (1971):

By dealing with a problem that only involves two variables we can plot this as a graph in which each of the three constraints is shown as a thin line (Figure 7‑3A). Here the horizontal axis is x1 and the vertical axis, x2. The three thin colored lines correspond to the three constraints graphed as linear equations g1 (blue), g2 (green) and g3 (red). The solution space is bounded by these three lines (as a result of the inequalities we have defined) and the two axes.

A. LP solution for constrained linear optimization |
B. Simple relaxed optimization |

The thick lines show the objective function for increasing values of z. The leftmost line is z=50, then z=55, 60 and 65. The largest possible value the objective function can take is shown by the rightmost thick line, where it just remains inside the two constraint lines illustrated and below the third constraint line.

For this problem the real-valued solution is x1=4.96 and x2=2.45, but integer solutions may be obtained by searching the values at the neighboring grid line intersections shown that give the highest value of the objective function, in this instance x1=4 and x2=3. This problem can be re-cast, but this time using Lagrangian relaxation applied to a single constraint, as follows:

As can be seen, the third constraint has now been incorporated into the objective function. With this change the objective function can fall outside the boundaries set by the third constraint for all values of λ>0, as shown by the rightmost line (Figure 7‑3B). In this instance the same values for z as above have been used but are augmented by the relaxation element. The relaxation of the third constraint has allowed the objective function to stray beyond the limits of the fully constrained solution. In general, all the inequality constraints can be incorporated into such a relaxation process, giving the more general expression for functions f() and g() as apply in the classical model described earlier.

The above examples have been applied to simple linear expressions, which is not in practice necessary. However, with more complex functions and problems (e.g. constrained shortest path problems, optimum facility location problems) the use of Lagrangian relaxation has been found to be an extremely valuable tool in identifying high quality solutions within acceptable time limits.

Ant systems and ant colony optimization (ACO)

Ant systems draw their inspiration from the behavior of ants as they forage for food. It had been observed that ants lay trails using pheromones as they explore their environment. Successful trails (e.g. those leading to and from a food source) become reinforced as more and more ants use them (a form of learning or collective memory). This led, in the early 1990s, to research into the use of such behavioral models to solve difficult combinatorial problems such as the traveling salesman problem (TSP). The original computational model, known as Ant Systems (AS) worked reasonably well for small TSP problems but not for larger instances. This encouraged the development of more sophisticated models, particularly combining basic AS ideas with efficient local search (LS) strategies. This research area has been dominated by the work of a few key scientists, whose work is best seen via the website www.aco-metaheuristic.org, the book, “Ant Colony Optimization” by Dorigo and Stützle (2004) and Dorigo’s ACO article on Scolarpedia. Their work has shown (see for example, Stützle and Dorigo, 1999) that variants of ACO can solve quite large TSP problems exactly or close to the known optimum, with acceptable levels of performance. The approach has also been found effective in vehicle routing problems with time window constraints on deliveries, and in certain dynamic routing problems. However, unlike some other meta-heuristics, its broader-scale application is less obvious although it has clear links to the field of so-called Swarm Intelligence (see further Section 8.2.3, Agents and agent-based models). For example, in connection with the latter, Batty, Desyllas and Duxbury (2002, 2003) successfully applied such ideas to modeling pedestrian behavior in crowded streets using pheromone surfaces rather than discrete trails. The basic ACO procedure for the TSP is as follows:

Let m be the number of ants to be used, n be the number of cities (m<=n) and t be the iteration count of the process. Let dij be a measure of the distance between cities, and define problem parameters: α, β (see below). We also define an initial pheromone value, τij to each arc (i,j) connecting cities i and j.

•Place each of the ants on a randomly chosen city

•Construct a tour of the cities for each ant as follows: For an ant currently at city i visit an unvisited city j with a probability defined by the current pheromone strength for that link and on how far away the city i is from j (see further, below)

•Optionally improve each of the individual tours generated by the ants using a local search heuristic

•Repeat this process for each ant and when complete, update the pheromone trail values, as shown below

Defining the probability function: The standard probability function (i.e. should the ant now visit city j?) applied for ant k currently at city i during iteration t is of the form:

In this expression the set N is the currently valid neighborhood for this ant, i.e. the set of cities not yet visited. This probability function is a combination of two components: the first is the strength of the pheromone trail and the second is a distance decay factor. If α=0 then the pheromone component has no impact and the probabilistic assignment is based on whichever city is closest, i.e. the basic greedy heuristic, whilst if β=0 assignment is simply based on pheromone trail strength, which has been found to lead to stagnation of the solutions giving sub-optimal tours. Example test values for the parameters are α=1 and β=2

Updating the pheromone trails: in theory the pheromone trails could be updated continuously or after each ant has completed a tour. In practice it has been found that updating the trail values after all ants have completed an iteration is more effective. The updating involves two components: the first is to reduce all trail values by a constant factor, ρ (analogous to an evaporation process); and then to add an increment based on the length of the kth ant’s tour, Lk(t), for all ants:

This updating ensures that firstly, links do not become over-saturated with pheromones and secondly, that links visited by ants that make the shortest tours are those whose pheromone values are increased the most ― a re-enforcement or learning process. Stützle and Dorigo (1999) found that the quality of solutions could be considerably improved by minor modifications to this scheme. Firstly, by restricting the maximum and minimum values the pheromone values can take, with initialized values being set to the upper limit of this range; and secondly, by only allowing updating of trail values by the best performing ant rather than all ants (this approach can also reduce the memory overhead of the algorithm considerably). With these modifications the authors were able to solve problems with 1500+ cities (e.g. TSPLIB test case fl1577) giving a range of results [22286,22358] which compares with the true optimum solution for this problem of 22249.