Overview of shortest path problems

Unlike some of the previous problems, the general shortest path (SP) problem requires a predefined network. The basic problem is then to determine one or more shortest (or least cost) routes between a source vertex and a target vertex where a set of edges are given. In several cases SP algorithms also provide shortest paths from a single source to all or many of the other vertices in a network as a side-effect of the procedure. Such algorithms are known as single-source SPAs or single-source queries. The set of shortest paths generated from a single source is known as a shortest path tree (SPT).

The determination of shortest paths can be specified as a linear programming problem, as follows. Let s be the source vertex, t be the target vertex and let cij>0 be the cost or distance associated with the link or edge (i,j). Then we seek to minimize z, where:

Although this formulation enables shortest path problems to be solved by standard LP software, and is guaranteed to yield all-integer solutions, in practice it is far less efficient than utilizing specialized network aware algorithms.

Typical exact algorithms for computing shortest paths are those of Dijkstra (1959, see below) and Dantzig (1960, see below), both of which are single source SPAs. These ‘greedy’ algorithms applied to planar graphs with non-negative edge weights typically require O(n2) time, where n is the number of vertices. However, Henzinger, Klein and Rao (1997) have shown that a near linear time algorithm is possible, i.e. O(n). The complexity of Dijkstra’s original algorithm can be reduced to O(m+nlogn) using Fibonacci heaps, where m represents the number of edges.

In some cases shortest path problems are subject to additional side constraints. For example, the shortest acceptable route in a telecommunications network may be subject to certain quality requirements — if some routes have higher quality than others, these may be preferable despite the time, cost or distance of the route being greater. Another example might be a limit on the total budget available for a given trip, where some routes have higher costs than others. Such side conditions can be incorporated into the model definition as one or more inequalities, where each network edge has not only a cost or distance value, but also one or more positive weights. Collectively these define the general Constrained Shortest Path Problem (CSPP), good solutions for which can be achieved via a number of techniques, including Lagrangian relaxation (see for example, Beasley and Christofides, 1989, and Carlyle and Wood, 2003). Unlike SPPs, which can be solved in polynomial time, CSPPs are NP-Complete.

Dantzig algorithm

Figure 7‑8 illustrates Dantzig’s step-by-step algorithm for determining the shortest path from vertex 1, 2, 3 and 4 to all other vertices in a directed planar graph (a digraph) with positive edge weights. The algorithm is a form of discrete dynamic programming. Figure 7‑8A shows the graph and Figure 7‑8B-D the steps involved in determining the shortest path from vertex 1 to 3, highlighted in red.

Figure 7‑8 Dantzig shortest path algorithm

The basic steps in this procedure are as follows:

1: identify the shortest (least distance/cost/time) edge from vertex 1 — this is to vertex 2 (cost=4). Add vertex 2 and the edge or link from 1 to 2 to the tagged set. If a tie occurs, arbitrarily choose one of the edges

2: identify the shortest (least cumulative cost/time) edge from vertex 1 or from vertex 2 plus edge (1,2) distance — this is to vertex 4 from 2 (cost=6). Add vertex 4 and edge 2 to 4 to the tagged set

3: identify the shortest (least cumulative cost/time) edge from the tagged set — this is from vertex 1 to 2 to 4 to 3 (cost=7)

Stop — all vertices reached; repeat from vertex 2, 3 and 4 to compute all shortest paths from every vertex

This algorithm is very similar to the standard Dijkstra procedure:

Dijkstra algorithm

This algorithm works by storing the cost of the shortest path found so far between the start vertex, s, and each target vertex, t. We shall denote this distance d(t).

The basic steps in this classic algorithm are as follows:

1: initialize all vertices such that d(t)=∞ (infinity, or in practice, a very large value) and d(s)=0

2: For each edge leading from s, add the edge length from s to the current value of the path length at s. If this new distance is less than the current value for d(t) replace this with the lower value

3: choose the lowest value in the set d(t) and move the current (active) vertex to this location

4: iterate steps 2 and 3 until the target vertex is reached or all vertices have been scanned

Note that the distance to every vertex reached is contained in the vector d(t). If the entry is infinity then it means the vertex is unreachable or has not been scanned. Also note that unlike the Dantzig algorithm the shortest path(s) are not generated, only their lengths. To obtain these the algorithm must be altered to retain a list of shortest path vertices, e.g. by storing both the shortest path distance and the preceding vertex address of the shortest path in the vertex label.

It should also be noted that basic Dijkstra-type algorithms provide inconsistent run-times on real-world networks, principally because such networks have very limited vertex connectivity (their adjacency matrices are very sparse). SPAs need to be either very efficient implementations of these basic algorithms or should use heuristics, such as the A* algorithm, to obtain faster solutions (e.g. for real-time/interactive applications).

A* algorithm

Dijkstra’s algorithm is a special case of a more general algorithm known as A*. Strictly speaking A* is a goal-directed best-first algorithm. It visits nodes in an order that may be preferable (faster) than the simple sweep of all nodes that the Dijkstra algorithm adopts. To select which node to visit next it typically computes the Euclidean distance of each vertex from the target and adds to this the distance currently recorded via the network to this vertex. Vertices are visited in overall distance priority order. Typically the A* algorithm will visit fewer nodes for a specified source-target (exact) route than other algorithms and will thus on average be faster — often a lot faster.

GIS implementations of SPAs

Most GIS packages do not provide information on the algorithms used for this class or any similar network analysis problems, nor references to indicate the procedures used. Only the experience of using a particular implementation will confirm the nature of its behavior, both in terms of computational complexity (time and space) and quality (optimality). Systematic testing of sample networks and problems based on a range of sizes (vertices) and plotting the results will provide a good indication as to the resource requirements and practicality of using any particular software. If a preferred package (e.g. the corporate standard) does not perform in the manner desired, then solving such problems externally using tools such as Concorde or LEDA, and then re-importing the results back into the GIS, may be the best option. These latter packages, and similar software, do provide details of the algorithms used, the time complexity of their solution procedures, and in many cases optional source code and linkable library code.

Figure 7‑9 illustrates the process of carrying out simple variants of SPA using ArcGIS Network Analyst and a real street network. The source dataset for the network was a shapefile of the street layout for Salt Lake City (SLC), Utah, USA. This is available as a downloadable file from the SLC authorities. Since the file is not in network-ready form it was processed using ArcCatalog to automatically create a network file that includes vertices at every intersection and end of line. Without further alteration (unrealistic in practice) this file was then used to compute the shortest path connecting four locations (Figure 7‑9A). In this case the locations were selected interactively and parameters chosen for the computation. These included permitting U-turns and including two barriers (indicated by X symbols on the map).

The shortest path shown in Figure 7‑9B is technically an open tour, or sequential ordering process, in that it is a connected series of shortest paths from 1-2, 2-3, and finally 3-4. It can also be seen as a shortest path from location 1 to location 4, with the constraint that the route must go via locations 2 and 3 — a very common requirement. A closed tour, returning to location 1, could have been generated by adding a shortest path from location 4 to 1.

Figure 7‑9 Salt Lake City — Sample networking problems and solutions

A. Delivery locations and road barriers |
B. Shortest path solution (sequential tour from 1-4) |

C. Shortest path solution (optimal tour from 1-4) |
D. Closest facilities to an incident |

The tour described above is not optimal if the sequential condition is dropped — the tour 1-4-3-2 is shorter (Figure 7‑9C). This latter tour was generated using the same dataset but this time using the Manifold package, Transform toolbar facilities. Note that there are some obvious differences in the solution map, other than the totally different routing. One of these is the lack of U-turns, an option which is not enabled in this instance. Another is the difference in default orientation of the display, which has no real significance. Manifold’s solution is obtained by exhaustive search where there are 10 or fewer locations to be linked (i.e. an exact solution), and applies an heuristic algorithm (specific details are not provided) for 11+ sites. A discussion of optimal tours and associated algorithms in provided in Section 7.3.5, Tours, traveling salesman problems and vehicle routing.

Figure 7‑9D shows the result of addressing a related problem — that of identifying the closest facility or facilities to a specified location such as an incident. In this example the solution has again been generated using ArcGIS, which has identified two locations within a given distance or time from an incident location at the bottom of the map, together with optimal routes to or from the incident. Note that again U-turns have been permitted and that, depending on the network characteristics (turn permissions, one-way streets, traffic speeds etc.), the route to an incident may well differ from the route away from the incident. Often, of course, the routes to and from a location will be to and from different third locations (e.g. ambulance station and hospital).

Further SPAs applications

Shortest path algorithms provide a key component of many problems in spatial analysis. The primary group of applications are those involving transport network routing, for example traveling from one location to another across a network with various constraints (e.g. turn restrictions, speed restrictions, directional restrictions). Extensions of this basic functionality that include other considerations such as traffic flows, second, third or kth shortest paths, computation of tours etc., all form standard building blocks of solutions to network and location analysis optimization problems. But there are additional application areas that are less obvious. One example is the extension of SPAs to non-network spatial environments. For example, a shopping mall, an airport or a national park, are all examples of spaces that pedestrians or maybe electronic buggies or bicycles might traverse, and computation of paths in such spaces is frequently of interest. In such cases one approach is to overlay a lattice on the available space and fully connect adjacent lattice nodes/centers or edges. This provides a high density set of links that may be used to select alternative routes ― perhaps shortest routes to emergency exits (note that this idea is not restricted to 2D spaces). An example of this idea applied to an office environment is shown in Figure 7‑10, where a standard SPA algorithm has been applied with the additional constraint that a minimum distance from obstacles (e.g. desks, chairs etc.) must be maintained.

Figure 7‑10 Shortest obstacle-avoiding path

There are many alternative approaches to solving the kind of problem illustrated in Figure 7‑10. One group falls into the general area of computational geometry. In this view all elements of the problem are treated as polyhedral objects and the solution space is partitioned to identify true shortest (Euclidean) paths ― in the example illustrated such methods would identify a direct path (a straight line) for the first part of the route shown, rather than the jagged route generated by the lattice-based SPA. Indeed, all lattice-based SPAs will generate biased ‘shortest’ routes (directionally and distance biased). One (partial) solution to such problems is to solve the lattice or raster variant of the problem first, using simple and fast algorithms, and then seek to post-optimize the resulting solution, for example by seeing if sections of the route can be straightened whilst meeting the constraints applied.

SPAs may also be applied to more abstract problems, where the network graph does not represent a spatial problem but a logical problem. Examples include searching certain types of database, or arranging a series of processes in such a way as to minimize the overall time or cost of an operation (the latter is known as a scheduling problem).