A model is valid to the extent that it adequately represents the system being modeled. However, the validity of a model should not be thought of as binary (i.e. a model cannot simply be classified as valid or invalid); a model has a certain degree of validity (Law and Kelton, 1991). Validity can be ascertained by comparing the output of the model with comparable data collected from a real-world system. For example, to understand the output of an agent-based model it is often necessary to evaluate the details of a specific simulation ‘history’. There are at least three ways in which history can be described (Axelrod, 2006):

• | History can be reported as a selection of key events in chronological order. For instance, the simulation of a train station evacuation could be described at the point in time when the emergency alarm sounds, when strategic confines of the station have been evacuated (e.g. platform, escalator/stairs, ticket hall, etc.), or when the station has been fully vacated. Whilst informative, this method provides little explanatory power about the model itself |

• | Alternatively, the history of one agent can be documented. For example, the location of a pedestrian within the station when the emergency alarm sounds, the time taken for the pedestrian to reach strategic locations within the station thereafter (e.g. platform, escalator/stairs, ticket hall, building exit), and a summary of the route traversed by the agent. This is often the easiest type of history to understand, and can be very revealing about the way in which the model works (i.e. how the logic of the model affects agents over time) |

• | Finally, the history from a global viewpoint can be noted. For example, the distribution of pedestrians throughout the station may be used, to assess the usefulness of different emergency exits. Although the global viewpoint is often regarded as the best method for observing large-scale patterns, several detailed histories are often required to explain the reasons behind these observed patterns |

Although the analysis of individual histories is interesting, they can be misleading; especially if the model incorporates random elements. For example, simulations often use a random number generator to imitate the decision making process of an agent (e.g. direction choices, mood preferences, etc.), to randomize the order in which agents move, or to substitute an unmeasured parameter (equivalent to the modeler making a guess in the absence of more accurate information). In order to determine whether the conclusion from a simulation run is typical, it is necessary to undertake a repeated number of simulations using identical parameters and initial conditions, but using different random number seeds. This will help distinguish whether particular patterns observed in a single illustrative history are idiosyncratic or typical. Results from these simulation runs will need to be presented as distributions, or as means with confidence intervals. Statistical analysis will be required to assess any variation in the model output, and to determine whether inferences from the simulation histories are well founded.

It is usually desirable to engage in sensitivity analysis once a model (at least for a specific set of initial conditions and parameter values) appears to be valid. The aim of sensitivity analysis is to determine the extent to which variations in the model’s assumptions yield differences in the model output. The principle behind sensitivity analysis is to vary the initial conditions and parameters of the model by a small amount and observe differences in the model outcomes. For example, a model might be run several times, varying a given parameter between 10% above and below the original value. If the impact on the output is negligible, it can be assumed that the parameter is not of critical importance to the model, and its accuracy is not of major concern. However, a note of caution should be observed since complex systems can exhibit large and sudden shifts in system behavior in response to relatively small perturbations in inputs.

Sensitivity analysis is also used to investigate the robustness of the model. If the behavior of the model is very sensitive to small differences in the values of one or more parameters, the modeler might be concerned whether these particular values are correct. Unfortunately, even with a small number of variables, the required number of parameter combinations can become very large, and the resources required to perform a thorough analysis can become excessive. In practice, a modeler is likely to have a good intuition about which parameters are likely to have the largest impacts upon the output of the model, and therefore which parameters will be more critical to examine. The effect of different model versions can also be assessed by running controlled experiments with sets of simulation runs, akin to the evaluation of parameter changes. The difference in the logic of a model (e.g. changes in rules governing agent behavior and/or interaction, etc.) can be studied by systematically comparing different versions of the model. However, it is imperative that initial conditions are kept identical for any comparison to be valid.

There are a few caveats that must be considered while validating and analyzing the output of a model. Firstly, both the model and the system under analysis are likely to be stochastic. Thus, comparison between the model output and data from the real-world system are unlikely to correspond on every occasion. Whether the significance of this difference is enough to cast doubt upon the model depends partly on the expected statistical distribution of the simulation output. Unfortunately, these distributions are rarely known a priori and are difficult to estimate with simulations, especially if outcomes are emergent. Another problem relates to the capability of the model to make predictions, since these will almost certainly be conditional (i.e. it is unlikely that all postulated outcomes can be produced). For instance, a model may be able to produce plausible future predictions, but may not be able to recreate known past system states. Furthermore, there is a possibility that the model is correct, but the data from the real-world system are not (i.e. inappropriate assumptions or estimates could have been obtained from the data). Finally, many simulations are path dependent (i.e. the outcome of a simulation depends upon the exact initial setup chosen). Calibration and validation are arguably amongst the most challenging issues of ABM. Even though there may be correspondence between a model’s output and a real-world system, this is not a sufficient condition to conclude that the model is correct (Gilbert, 2004). There are many different processes which could yield a given outcome, and just because a model generates similar outcomes does not prove that the processes included within the model account for the real-world outcome. However, a model should be regarded as a basis for reducing uncertainty about the future, from a prior state of unawareness, to one of more limited uncertainty. A model should not be considered a failure if its predictions are not perfectly accurate or if a modeler is left unsure whether processes included within the model account for the real-world outcome. It takes time and many refinements of a model, but a modeler can gradually increase their confidence in a model by testing it against real-world data in more and more ways.