|
|
If the dataset we have is not zone-based but point- or line-based, alternative methods of determining density are required. The simplest measures apply some form of zoning and count the number of events in each zone, returning us to a zone-based dataset. For example, with data on thefts from cars (TFC events), each event may be geocoded and assigned to a unique zone such as a pre-defined 100mx100m grid (or quadrat) , or to city blocks, census blocks, or policing districts. The density of TFC events is then simply the event count divided by the area of the associated zone in which they are reported to have occurred.
A difficulty with this approach is that densities will vary substantially depending on how the zoning, grid or quadrat is selected, and on how variable are the size and associated attributes of each zone (e.g. variations in car ownership and usage patterns). Overall density calculation for all zones (e.g. a city-wide figure for the density of TFC events) suffers similar problems, especially with regard to definition of the boundary (such problems are generally known as edge effects, and methods to adjust for edge effects are known as edge correction or edge bias correction methods). Use of such information to compare crime event levels within or across cities is clearly fraught with difficulties. An alternative approach, particularly suited to ecological datasets, is to define zones based on a covariate of interest, for example the underlying geology and terrain slope. Each classified area can then be taken as a separate zone type and densities computed within these (polygonal) boundaries. This type of computation is supported within spatstat.
An alternative approach to density computation for two-dimensional point-sets is based on techniques developed in one-dimensional (univariate) statistical analysis. This is simplest to understand by looking at the example in Figure 4‑41. We have a set of 5 events marked by crosses. They occur at positions 7, 8, 9, 12 and 14 along the line. We could argue that the point density across the entire 20-unit line length is 5/20=0.25 points per unit length and assign a value of 0.25 to each section, as shown on the gray line. We might equally well argue that if we divide the overall line into two halves, the density over the first half should be 0.3 per unit length and over the second, 0.2 per unit length, to reflect the variation in positioning of the 5 events.
There is clearly no single right answer or single method to assign the points to the line’s entire length, so the method we choose will depend on the application we are considering. Important observations to notice about this problem include: the length of the line we start with seems to have an important effect on the density values we obtain, and since this may be arbitrary, some method of removing dependence on the line length is desirable; if the line is partitioned into discrete chunks a sudden break in density occurs where the partition boundaries occur, which is often undesirable; depending on the number of partitions and distribution of points, areas may contain zero density, even if this is not the kind of spread we are seeking or regard as meaningful; the line is assumed to be continuous, and allocation of density values to every part is valid; and finally, if we have too many partitions all sections will only contain values of 1 or 0, which is essentially back to where we started from.
These observations can be dealt with by treating each point in the original set as if it was spread over a range, then adding together overlapping zones and checking that the total adds up to the original value. For example, choosing each point and smoothing it over 5 units in a uniform symmetric manner, we obtain the result shown in Figure 4‑42. The rows in the diagram show the spreading of each of the 5 original points, with the total row showing the sums (densities) assigned to each unit segment. These add up to 5, as they should, and a chart showing this distribution confirms the pattern of spread. This method still leaves us with some difficulties: there are no density values towards the edges of our linear region; density values still jump abruptly from one value to the next; and values are evenly spread around the initial points, whereas it might be more realistic to have a greater weighting of importance towards the center of each point. All of these concerns can be addressed by selecting a well-defined, smooth and optionally unbounded function, known as a kernel, and using this to spread the values. The function often used is a Normal distribution, which is a bell-shaped curve extending to infinity in each direction, but with a finite (unit) area contained underneath the bell (Figure 4‑43).
Figure 4‑43 Univariate Normal kernel smoothing and cumulative densities

Figure 4‑44 Alternative univariate kernel density functions

In Figure 4‑43, for each point (7,8,9,12 and 14) we have provided a Normal distribution curve with central value (the mean) at the point in question and with an average spread (standard deviation) of 2 units. We can then add the areas under each of these curves together to obtain a (cumulative) curve with two peaks, and then divide this curve by 5 to adjust the area under the curve back to 1 giving the lower red curve shown. When adjusted in this way the values are often described as probability densities, and when extended to two dimensions, the resulting surface is described as a probability density surface, rather than a density surface. We now have a density value for every position along the original line, with smooth transitions between the values, which is exactly what we were trying to achieve.
There still remain some questions: why should we use the Normal distribution? could we not use almost any unimodal symmetric function with a finite area under it? and why did we choose a value of 2 units for the average spread? The answer to these questions is that the specific selections made are a matter of choice and experience, although in some instances a symmetric distribution with a finite extent (e.g. a box or triangular function) may be regarded as more suitable than one with an infinite possible extent. Figure 4‑44 shows a selection of commonly used functions plotted for the same point set, using the MATLab Statistics Toolbox function ksdensity(). As may be seen from examining the various curves shown in Figure 4‑44 the exact form of the kernel function does not tend to have a major impact on the set of density values assigned across the linear segment (or area in 2D applications). Of much greater impact is the choice of the spread parameter, or bandwidth. For further details on these and other functions used for smoothing and density estimation see Bowman and Azzalini (1997) and Silverman (1986).
All of this discussion so far addresses problems in one dimension (univariate density estimation). We now need to extend the process to two dimensions, which turns out to be simply a matter of taking the univariate procedures and adding a second dimension (effectively rotating the kernel function about each point). If we were to use the Normal distribution again as our kernel function it would have a two-dimensional bell-shaped form over every point (Figure 4‑45). As before, we place the kernel function over each point in our study region and calculate the value contributed by that point over a finely drawn grid. The grid resolution does not affect the resulting surface form to any great degree, but if possible should be set to be meaningful within the context of the dataset being analyzed, including any known spatial errors or rounding that may have been applied, and making allowance for any areas that should be omitted from the computations (e.g. industrial zones, water, parks etc. when considering residential-based data). Values for all points at every grid intersection or for every grid cell are then computed and added together to give a composite density surface.
Figure 4‑45 2D Normal kernel

The resulting grid values may be provided as: (i) relative densities — these provide values in events per unit area (i.e. they are adjusted by the grid size, giving a figure as events per square meter or per hectare) — this is the default or only option in many GIS packages, including ArcGIS; (ii) absolute densities — these provide values in terms of events per grid cell, and hence are not adjusted by cell size. The sum of the values across all cells should equal the number of events used in the analysis; (iii) probabilities — as per (ii) but divided by the total number of events. Crimestat supports all three variants. Computed density values may then be plotted in 2D (e.g. as density contours) or as a 3D surface (as in Figure 4‑46). In this latter case the kernel density procedure has been applied to a dataset of reported cases of lung cancer in part of Lancashire, England. This example is discussed further in Section 5.4.4, and fully in Diggle (1990) and more recently in Baddeley et al. (2005). The software used in this instance was Crimestat, with a Normal kernel function and average spread (bandwidth) determined from the point pattern itself. Map data were output in ESRI shape file format (SHP) and mapped in ArcGIS to produce a 2D map and in ASCII grid format for visualization in 3D using Surfer. Other GIS packages support a variety of kernel functions and procedures. ArcGIS Spatial Analyst provides kernel density estimation for point and line objects, but only supports one kernel function, which it describes as a quadratic kernel (a bounded kernel) but which is often described as an Epanechnikov kernel (see further, Table 4‑8).
Figure 4‑46 Kernel density map, Lung Case data, 3D visualization

MapInfo’s grid analysis add-on package, Vertical Mapper, includes kernel density mapping as do the spatial statistics packages SPLANCS and spatstat, and the crime analysis add-on for MapInfo, Hot Spot Detective (in the latter two cases based on quartic kernels). TransCAD/Maptitude supports what it describes as density grid creation with the option of count (simple), quartic, triangular or uniform kernels. Crimestat supports four alternative kernels to the Normal, all of which have finite extent (i.e. typically are defined to have a value of 0 beyond a specified distance). These are known as the quartic, exponential, triangular and uniform kernels. If kernel density methods are applied in urban areas the use of network kernel density estimation should be considered (see Section 0), as Okabe et al. (2009) have shown that the use of planar KDEs can lead to erroneous conclusions regarding clustering for point events on networks.
The details of each of the main kernel functions used in various GIS and statistical packages are as shown in Table 4‑8. The value at grid location gj, at a distance dij from an event point i, is obtained as the sum of individual applications of the kernel function over all event points in the source dataset. The table shows normalized functions, where the distances dij have been divided by the kernel bandwidth, h, i.e. t=dij/h. Graphs of these functions are shown in Figure 4‑47, where each has been normalized such that the area under the graph sums to 1.
Whether the kernel for a particular event point contributes to the value at a grid point depends on: (i) the type of kernel function (bounded or not); (ii) the parameter, k, if applicable, which may be user defined or determined automatically in some instances; and (iii) and the bandwidth, h, that is selected (a larger bandwidth spreads the influence of event points over a greater distance, but is also more likely to experience edge effects close to the study region boundaries). Event point sets may be weighted resulting in some event points having greater influence than others.
Table 4‑8 Widely used univariate kernel density functions
|
Formula |
Comments. Note t=dij/h, h is the bandwidth |
|
|
|
Unbounded, hence defined for all t. The standard kernel in Crimestat; bandwidth h is the standard deviation (and may be fixed or adaptive) |
|
|
|
Bounded. Approximates the Normal. k is a constant |
|
|
|
Optionally bounded. A is a constant (e.g. A=3/2) and k is a parameter (e.g. k=3). Weights more heavily to the central point than other kernels |
|
|
|
Bounded. Very simple linear decay with distance. |
|
|
Uniform (flat) |
|
Bounded. k=a constant. No central weighting so function is like a uniform disk placed over each event point |
|
Epanechnikov (paraboloid/quadratic) |
|
Bounded; optimal smoothing function for some statistical applications; used as the smoothing function in the Geographical Analysis Machine (GAM/K) and in ArcGIS |
Bandwidth selection is often more of an art than a science, but it may be subject to formal analysis and estimation, for example by applying kernel density estimation (KDE) procedures to sets of data where actual densities are known. An alternative to fixed bandwidth selection is adaptive selection, whereby the user specifies the selection criterion, for example defining the number of event points to include within a circle centered on each event point, and taking the radius of this circle as the bandwidth around that point. A somewhat different form of adaptive density model is provided within the spatstat software. In this case a random sample of points (a fraction, f%) is taken, and this is used to create a voronoi tessellation of the sample point set. The density of the remaining points (1-f%) is then computed for each cell of the tessellation. The process is then replicated n times and the average densities of all replicates computed.
Figure 4‑47 Univariate kernel density functions, unit bandwidth
|
A. Constant |
B. Normal, SD=1 |
|
|
|
|
C. Exponential |
D. Quadratic |
|
|
|
|
E. Quartic |
F. Triangular (or linear) |
|
|
|
Kernel smoothing, or kernel density estimation methods (KDE methods) of the type described have a variety of applications: exploratory point data analysis; point data smoothing; creation of continuous surfaces from point data in order to combine or compare these with other datasets that are continuous/in raster form; probability distribution estimation; interpolation (although this terminology is confusing and not recommended; Crimestat is amongst a number of packages that use this terminology, which is essentially incorrect); and hot spot detection.
KDE can also be used in visualizing and analyzing temporal patterns, for example crime events at different times of day and/or over different periods, with the objective of understanding and potentially predicting event patterns. KDE can also be applied with more than one point set, for example a set of cases and a set of controls. The output of such “dual” dataset analysis is normally a ratio of the primary set to the secondary set (or ratio of the log transform of the variables), with the objective being to analyze the primary pattern with background effects being removed or minimized. Care must be taken in such cases that the results are not subject to distortion by very low or zero values in the second density surface, and that common bandwidths are used. Crimestat provides support for 6 alternative dual density outputs: simple ratio; log ratio; difference in densities (two variants, with and without standardization); and sum of densities (again, with and without standardization). Levine (2007, Chapter 8) provides an excellent discussion of the various techniques and options, including alternative methods for bandwidth selection, together with examples from crime analysis, health research, urban development and ecology.
The use of kernel functions as a form of point weighting enables the creation of local weighted means and variances. The locally weighted (kernel) statistics supported within the GWR software are defined as follows:
, and 
These are local statistics, based on locations, u, and weighting function w() defined by a fixed or adaptive kernel function. The bandwidth in this case must be user-defined, and might be chosen on the basis of experience or could use an adaptively derived bandwidth (e.g. from an associated regression study) for comparison purposes.
|
|