Having agreed on the problem definition the next stage is to formulate an approach that has the best possible chance of addressing the problem and achieving answers (outcomes) that meet expectations. Although the PLAN phase is next in the sequence, the iterative nature of the PPDAC process emphasizes the need to define and then re-visit each component. Thus whilst an outline project plan would be defined at this stage, one would have to consider each of the subsequent stages (DATA, ANALYSIS, CONCLUSIONS) before firming up on the detail of the plan. With projects that are more experimental in nature, drawing up the main elements of the PLAN takes place at this stage. With projects for which pre-existing datasets and analysis tools are expected to be used, the PLAN stage is much more an integrated part of the whole PPDAC exercise.
The output of the PLAN stage is often formulated as a detailed project plan, with allocation of tasks, resources, timescales, analysis of critical path(s) and activities, and estimated costs of data, equipment, software tools, manpower, services etc. Frequently project plans are produced with the aid of formal tools, which may be paper-based or software assisted. In many instances this will involve determining all the major tasks or task blocks that need to be carried out, identifying the interconnections between these building blocks (and their sequencing), and then examining how each task block is broken down into sub-elements. This process then translates into an initial programme of work once estimated timings and resources are included, which can then be modified and fine-tuned as an improved understanding of the project is developed. In some instances this will be part of the Planning process itself, where a formal functional specification and/or pilot project forms part of the overall plan.
As with other parts of the PPDAC process, the PLAN stage is not a one-shot static component, but typically includes a process of monitoring and re-evaluation of the plan, such that issues of timeliness, budget, resourcing and quality can be monitored and reported in a well-defined manner.
The approach adopted involves consideration of many issues, including:
|•||the nature of the problem and project — is it purely investigative, or a formal research exercise; is it essentially descriptive, including identification of structures and relationships, or more concerned with processes, in which clearer understanding of causes and effects may be required, especially if predictive models are to be developed and/or prescriptive measures are anticipated as an output?|
|•||does it require commercial costings and/or cost-benefit analysis?|
|•||are particular decision-support tools and procedures needed?|
|•||what level of public involvement and public awareness is involved, if any?|
|•||what particular operational needs and conditions are associated with the exercise?|
|•||what time is available to conduct the research and are there any critical (final or intermediate) deadlines?|
|•||what funds and other resources are available?|
|•||is the project considered technically feasible, what assessable risk is there of failure and how is this affected by problem complexity?|
|•||what are the client (commercial, governmental, academic) expectations?|
|•||are there specifications, standards, quality parameters and/or procedures that must be used (for example to comply with national guidelines)?|
|•||how does the research relate to other studies on the same or similar problems?|
|•||what data components are needed and how will they be obtained (existing sources, collected datasets)?|
|•||are the data to be studied (units) to be selected from the target population, or will the sample be distinct in some way and applied to the population subsequently (in which case one must consider not just sampling error but so-called study error also)?|
When deciding upon the design approach and analytical methods/tools it is essential to identify available datasets, examine their quality, strengths and weaknesses, and carry out exploratory work on subsets or samples in order to clarify the kind of approach that will be both practical and effective. There will always be unknowns at this stage, but the aim should be to minimize these at the earliest opportunity, if necessary by working through the entire process, up to and including drafting the presentation of results based on sample, hypothetical or simulated data.
The application of a single analytical technique or software tool is often to be avoided unless one is extremely confident of the outcome, or it is the analytical technique or approach itself that is the subject of investigation, or that this approach or toolset has already been approved for use in such cases. If analysis is not limited to single approaches, and a series of outputs, visualizations, techniques and tests all suggest a similar outcome then confidence in the findings tends to be greatly increased. If such techniques suggest different outcomes the analyst is encouraged to explain the differences, by re-examining the design, the data and/or the analytical techniques and tools applied. Ultimately the original problem definition may have to be reviewed.
The impact on research of exceptions — rare events, spatial outliers, extreme values, unusual clusters — is extremely important in geospatial analysis. Exploratory methods, such as mapping and examining cases and producing box-plots (see further, Section 5.2.2, Outlier detection), help to determine whether these observations are valid and important, or require removal from the study set.
Some analytical techniques are described as being more robust than others. By this is meant that they are less susceptible to data extremes or unusual datasets — for example the median or middle value of a dataset is generally regarded as more robust than the mean or average value, because it is unaffected by the specific values of the set. However, the spatial mean and median exhibit different properties from those applied to individual tabulated attributes, and other measures of centrality (e.g. the central feature of a set) may be more appropriate in some instances. Likewise, statistical tests that make no assumptions about the underlying distribution of the dataset tend to be more robust than those that assume particular distributional characteristics — for example non-parametric versus parametric tests. However, increasing robustness may result in loss of power in the sense that some methods are described as being more powerful than others, i.e. they are less likely to accept hypotheses that are incorrect or reject hypotheses that are correct.