|
|
If the dataset we have is not zone-based but point- or line-based, alternative methods of determining density are required. The simplest measures apply some form of zoning and count the number of events in each zone, returning us to a zone-based dataset. For example, with data on thefts from cars (TFC events), each event may be geocoded and assigned to a unique zone such as a pre-defined 100mx100m grid, or to city blocks, census blocks, or policing districts. The density of TFC events is then simply the event count divided by the area of the associated zone in which they are reported to have occurred. A difficulty with this approach is that densities will vary substantially depending on how the zoning or grid is selected, and on how variable are the size and associated attributes of each zone (e.g. variations in car ownership and usage patterns). Overall density calculation for all zones (e.g. a city-wide figure for the density of TFC events) suffers similar problems, especially with regard to definition of the city boundary. Use of such information to compare crime event levels within or across cities is fraught with difficulties.
An alternative approach to density computation for two-dimensional point-sets is based on techniques developed in one-dimensional (univariate) statistical analysis. This is simplest to understand by looking at the example in Figure 4‑40. We have a set of 5 events marked by crosses. They occur at positions 7, 8, 9, 12 and 14 along the line. We could argue that the point density across the entire 20-unit line length is 5/20=0.25 points per unit length and assign a value of 0.25 to each section, as shown on the grey line. We might equally well argue that if we divide the overall line into two halves, the density over the first half should be 0.3 per unit length and over the second, 0.2 per unit length, to reflect the variation in positioning of the 5 events.
There is clearly no single right answer or single method to assign the points to the line’s entire length, so the method we choose will depend on the application we are considering. Important observations to notice about this problem include: the length of the line we start with seems to have an important effect on the density values we obtain, and since this may be arbitrary, some method of removing dependence on the line length is desirable; if the line is partitioned into discrete chunks a sudden break in density occurs where the partition boundaries occur, which is often undesirable; depending on the number of partitions and distribution of points, areas may contain zero density, even if this is not the kind of spread we are seeking or regard as meaningful; the line is assumed to be continuous, and allocation of density values to every part is valid; and finally, if we have too many partitions all sections will only contain values of 1 or 0, which is essentially back to where we started from.
These observations can be dealt with by treating each point in the original set as if it was spread over a range, then adding together overlapping zones and checking that the total adds up to the original value. For example, choosing each point and smoothing it over 5 units in a uniform symmetric manner, we obtain the result shown in Figure 4‑41. The rows in the diagram show the spreading of each of the 5 original points, with the total row showing the sums (densities) assigned to each unit segment. These add up to 5, as they should, and a chart showing this distribution confirms the pattern of spread. This method still leaves us with some difficulties: there are no density values towards the edges of our linear region; density values still jump abruptly from one value to the next; and values are evenly spread around the initial points, whereas it might be more realistic to have a greater weighting of importance towards the centre of each point.
All of these concerns can be addressed by selecting a well-defined, smooth and optionally unbounded function, known as a kernel, and using this to spread the values. The function often used is a Normal distribution, which is a bell-shaped curve extending to infinity in each direction, but with a finite (unit) area contained underneath the bell (Figure 4‑42). In this diagram, for each point (7,8,9,12 and 14) we have provided a Normal distribution curve with central value (the mean) at the point in question and with an average spread (standard deviation) of one unit. We can then add the areas under each of these curves together to obtain the brown (cumulative) upper curve with two peaks, and then divide this curve by 5 if we want to adjust the area under the curve back to 1 (giving the red curve shown ― effectively a form of normalisation of the distribution, a term not to be confused with the Normal distribution itself). When adjusted in this way the values are often described as probability densities, and when extended to two dimensions, the resulting surface is described as a probability density surface, rather than a density surface.
Figure 4‑42 Univariate Normal kernel smoothing and cumulative densities

Figure 4‑43 Alternative univariate kernel density functions

We now have a density value for every position along the original line, with smooth transitions between the values, which is exactly what we were trying to achieve. There still remain some questions: why should we use the Normal distribution? could we not use almost any symmetric function with a finite area under it? and why did we choose a value of 1 unit for the average spread? The answer to these questions is that the specific selections made are a matter of choice and experience, although in some instances a symmetric distribution with a finite extent (e.g. a box or triangular function) may be regarded as more suitable than one with an infinite possible extent.
Figure 4‑43 shows a selection of commonly used functions plotted for the same point set, using the MATLab Statistics Toolbox function ksdensity(). For further details on these and other functions used for smoothing and density estimation see Bowman and Azzalini (1997) and Silverman (1986). As may be guessed from examining the various curves shown in Figure 4‑43 the exact form of the kernel function does not tend to have a major impact on the set of density values assigned across the linear segment (or area in 2D applications). Of much greater impact is the choice of the spread parameter, or bandwidth.
All of this discussion addresses problems in one dimension (univariate smoothing). We now need to extend the process to two dimensions, which turns out to be simply a matter of taking the univariate procedures and adding a second dimension (effectively rotating the function about each point). If we were to use the Normal distribution again as our smoothing function it would have a two-dimensional bell-shaped form over every point (Figure 4‑44). As before, we place the kernel function over each point in our study region and calculate the value contributed by that point over a finely drawn grid. The grid resolution does not affect the resulting surface form to any great degree, but if possible should be set to be meaningful within the context of the dataset being analysed, including any known spatial errors or rounding that may have been applied, and making allowance for any areas that should be omitted from the computations (e.g. industrial zones, water, parks etc. when considering residential-based data). Values for all points at every grid intersection or for every grid cell are then computed and added together to give a composite density surface. This may then be plotted in 2D (e.g. as density contours) or as a 3D surface.
Figure 4‑44 2D Normal kernel

The resulting grid values may be provided as: (i) relative densities — these provide values in events per unit area (i.e. they are adjusted by the grid size, giving a figure as events per square metre or per hectare) — this is the default or only option in many GIS packages, including ArcGIS; (ii) absolute densities — these provide values in terms of events per grid cell, and hence are not adjusted by cell size. The sum of the values across all cells should equal the number of events used in the analysis; (iii) probabilities — as per (ii) but divided by the total number of events. Crimestat supports all three variants.
In Figure 4‑45 the kernel density procedure has been applied to a dataset of reported cases of lung cancer in part of Lancashire, England. Cases are shown as points in this map, with areas of higher kernel density being shown in darker tones. The highlighted point in the lower left of the map is the location of a disused incinerator (the white oval is a hypothetical plume extent based on the prevailing wind direction).
Figure 4‑45 Normal kernel density map, lung cancer cases

This example is discussed further in Section 5.4.3, and fully in Diggle (1990). The software used in this instance was Crimestat, with a Normal kernel function and average spread (bandwidth) determined from the point pattern itself. Map data were output in ESRI shape file format (SHP) and mapped in ArcGIS.
Other GIS packages support a variety of kernel functions and procedures. ArcGIS Spatial Analyst provides kernel density estimation for point and line objects, but only supports one kernel function, which it describes as a quadratic kernel (a bounded kernel) but which is often described as an Epanechnikov kernel (see further, Table 4‑8). MapInfo’s grid analysis add-on package, Vertical Mapper, includes kernel density mapping as does the spatial statistics package SPLANCS and the crime analysis add-on for MapInfo, Hot Spot Detective (in the latter two cases based on quartic kernels). TransCAD/Maptitude supports what it describes as density grid creation with the option of count (simple), quartic, triangular or uniform kernels. Crimestat supports four alternative kernels to the Normal, all of which have finite extent (i.e. typically are defined to have a value of 0 beyond a specified distance). These are known as the quartic, exponential, triangular and uniform kernels.
The details of each of the main kernel functions used in various GIS packages are as shown in Table 4‑8. The value at grid location gj, at a distance dij from an event point i, is obtained as the sum of individual applications of the kernel function over all event points in the source dataset. The table shows normalised functions, where the distances dij have been divided by the kernel bandwidth, h, i.e. t=dij/h. Graphs of these functions are shown in Figure 4‑46, where each has been normalised such that the area under the graph sums to 1.
Whether the kernel for a particular event point contributes to the value at a grid point depends on: (i) the type of kernel function (bounded or not); (ii) the parameter, k, if applicable, which may be user defined or determined automatically in some instances; and (iii) and the bandwidth, h, that is selected (a larger bandwidth spreads the influence of event points over a greater distance, but is also more likely to experience edge effects close to the study region boundaries). Event point sets may be weighted resulting in some event points having greater influence than others.
Table 4‑8 Widely used univariate kernel density functions
|
Formula |
Comments. Note t=dij/h, h is the bandwidth |
|
|
|
Unbounded, hence defined for all t. The standard kernel in Crimestat; bandwidth h is the standard deviation (and may be fixed or adaptive) |
|
|
|
Bounded. Approximates the Normal. k is a constant |
|
|
|
Optionally bounded. A is a constant (e.g. A=3/2) and k is a parameter (e.g. k=3). Weights more heavily to the central point than other kernels |
|
|
|
Bounded. Very simple linear decay with distance. |
|
|
Uniform (flat) |
|
Bounded. k=a constant. No central weighting so function is like a uniform disk placed over each event point |
|
Epanechnikov (paraboloid/quadratic) |
|
Bounded; optimal smoothing function for some statistical applications; used as the smoothing function in the Geographical Analysis Machine (GAM) and in ArcGIS |
Bandwidth selection is often more of an art than a science, but it may be subject to formal analysis and estimation, for example by applying kernel density estimation (KDE) procedures to sets of data where actual densities are known. An alternative to fixed bandwidth selection is adaptive selection, whereby the user specifies the selection criterion, for example defining the number of event points to include within a circle centred on each event point, and taking the radius of this circle as the bandwidth around that point.
Kernel smoothing, or kernel density estimation methods (KDE methods) of the type described have a variety of applications: point data smoothing; creation of continuous surfaces from point data in order to combine these with other datasets that are continuous/in raster form; probability distribution estimation; interpolation (although this terminology is confusing and not recommended — Crimestat is amongst a number of packages that use this terminology, which is essentially incorrect); and hot spot detection. KDE can also be used in visualising and analysing temporal patterns, for example crime events at different times of day and/or over different periods, with the objective of understanding and potentially predicting event patterns. KDE can also be applied with more than one point set, for example a set of cases and a set of controls. The output of such “dual” dataset analysis is normally a ratio of the primary set to the secondary set, with the objective being to analyse the primary pattern with background effects being removed or minimised. Care must be taken in such cases that the results are not subject to distortion by very low or zero values in the second density surface. Crimestat provides support for 6 alternative dual density outputs: simple ratio; log ratio; difference in densities (two variants, with and without standardisation); and sum of densities (again, with and without standardisation). Levine (2007, Chapter 8) provides an excellent discussion of the various techniques and options, including alternative methods for bandwidth selection, together with examples from crime analysis, health research, urban development and ecology.
Figure 4‑46 Univariate kernel density functions, unit bandwidth
|
A. Constant |
B. Normal, SD=1 |
|
|
|
|
C. Exponential |
D. Quadratic |
|
|
|
|
E. Quartic |
F. Triangular |
|
|
|
The use of kernel functions as a form of point weighting enables the creation of local weighted means and variances. The locally weighted (kernel) statistics supported within the GWR software are defined as follows:

and

These are local statistics, based on locations, u, and weighting function w() defined by a fixed or adaptive kernel function. The bandwidth in this case must be user-defined, and might be chosen on the basis of experience or could use an adaptively derived bandwidth (e.g. from an associated regression study) for comparison purposes.
|
|