|
|
Having agreed on the problem definition the next stage is to formulate an approach that has the best possible chance of addressing the problem and achieving answers (outcomes) that meet expectations. The approach adopted involves consideration of issues such as:
· the nature of the problem and project — is it purely investigative, or a formal research exercise; is it essentially descriptive, including identification of structures and relationships, or more concerned with processes, in which clearer understanding of causes and effects may be required, especially if predictive models are to be developed and/or prescriptive measures are anticipated as an output?
· does it require commercial costings and/or cost-benefit analysis?
· are particular decision-support tools and procedures needed?
· what level of public involvement and public awareness is involved, if any?
· what particular operational needs and conditions are associated with the exercise?
· what time is available to conduct the research and are there any critical deadlines?
· what funds and other resources are available?
· is the project considered technically feasible, what assessable risk is there of failure and how is this affected by problem complexity?
· what are the client (commercial, governmental, academic) expectations?
· are there specifications, standards, quality parameters and/or procedures that must be used (for example to comply with national guidelines)?
· how does the research relate to other studies on the same or similar problems?
· what data components are needed and how will they be obtained (existing sources, collected datasets)?
· are the data to be studied (units) to be selected from the target population, or will the sample be distinct in some way and applied to the population subsequently (in which case one must consider not just sampling error but so-called study error also)?
When deciding upon the design approach and analytical methods it is essential to identify available datasets, examine their quality, strengths and weaknesses, and carry out exploratory work on subsets or samples in order to clarify the kind of approach that will be both practical and effective. There will always be unknowns at this stage, but the aim should be to minimise these at the earliest opportunity, if necessary by working through the entire process, up to and including drafting the presentation of results based on sample, hypothetical or simulated data.
The application of a single analytical technique is to be avoided unless one is extremely confident of the outcome, or it is the analytical technique or approach itself that is the subject of investigation. If a series of approaches, visualisations, techniques and tests all suggest a similar outcome then confidence in the findings tends to be greatly increased. If such techniques suggest different outcomes the analyst is encouraged to explain the differences, by re-examining the design, the data and/or the analytical techniques applied. Ultimately the original problem definition may have to be reviewed.
The impact on research of exceptions — rare events, spatial outliers, extreme values, unusual clusters — is extremely important in geospatial analysis. Exploratory methods, such as mapping and examining cases and producing box-plots (see further, Section 5.2.2.2), help to determine whether these observations are valid and important, or require removal from the study set.
Some analytical techniques are described as being more robust than others. By this is meant that they are less susceptible to data extremes or unusual datasets — for example the median or middle value of a dataset is generally regarded as more robust than the mean or average value, because it is unaffected by the specific values of the set. However, the spatial mean and median exhibit different properties from those applied to individual tabulated attributes, and other measures of centrality (e.g. the central feature of a set) may be more appropriate in some instances. Likewise, statistical tests that make no assumptions about the underlying distribution of the dataset tend to be more robust than those that assume particular distributional characteristics — for example non-parametric versus parametric tests. However, increasing robustness may result in loss of power in the sense that some methods are described as being more powerful than others, i.e. they are less likely to accept hypotheses that are incorrect or reject hypotheses that are correct.
|
|