Sunday, 2 August 2020

Design of objective functions for model calibration

Introduction

Model calibration or parameter estimation is the task of tailoring the free parameters of the model to best explain the known observations. This is vital because making new predictions from the model follows this step. For the predictions to be accurate the free parameters have to be first identified with the known data. An effective passage through this step requires fighting on multiple fronts. First one is the design of the model itself. This seems trivial to say but its often not easy to test all the behaviors of the model and so it is an iterative process in general. If the starting point is to use a first order polynomial relation and it doesn't accurately fit the observations then it might be better to try higher order non-linear relations.

The second front for effectively attending the calibration task is to choose a search strategy. In case of continuous and differentiable functions calculus gives us the closed form solutions with the use of derivatives, in reality we often encounter functions too complicated to calculate derivatives. In such cases we rely on search/heuristic methods ranging from a simple grid search to advanced meta-heuristic methods (Genetic Algorithm for example).

Because an effective search relies on evaluating the fitness values at different positions in the search space, it is important to choose an appropriate objective function which is the third battle front. An objective function maps a position in space to a value (Scalar or a vector). If the goal of the search is to reach a known destination point the objective function can be thought of as a distance measure. In some cases we don't know the destination exactly but understand its characteristics, like the final point maximizes or minimizes some measure. In either cases designing the objective function that resonates with the particular instance is crucial to reach the ideal solution. For example, Consider a cube whose edges are the path connecting the vertices. If a particle is to move from one vertex to another, then picking an Euclidean distance measure is ineffective over a Manhattan distance measure. Since the particle cannot move along the diagonal, the Euclidean distance doesn't give a true measure of distance between points on opposite sides of a diagonal but a Manhattan distance does.

Path taken by a particle moving on cube along its edges from A to G is better expressed by Manhattan distance(dark edges) than a Euclidean distance(dotted red)

In this post I will share two instances where designing a functional form by incorporating the nuances of the case helped me implement an effective optimization.

Current value vs Cumulative value:

In a path finding problem the goal is to find a path between two points in a given map. For example we want to find a route between our home to a restaurant. To be efficient one would like to follow the shortest path or a thrifty one. The problem gets more realistic if the map contains roadblocks, circular paths, sections with difficult terrain to drive etc. To gauge the efficiency of a path chosen it is necessary to know the distance between points. On a map like a well planned city, the Manhattan distance is a good measure for the same reason as with the cube. But in such a setting one would easily run into the problem of circular path i.e returning to visited points and hence get struck in a loop. So an objective function better than the just distance to goal is needed. In this instance that would be to use a cumulative distance. That means instead of just learning the distance to destination from the current location, it is better to keep track of the distance traveled.

In a restricted map a simple measure of distance from current location to destination can lead to an impasse. A cumulative measure will play the long game and consider more options

An accumulated measure doesn't avoid taking a path that seems costly immediately but rather plays the long game. Such an approach of fitness measure is needed because there is no shortest segment path linking the origin and destination and ignoring this will lead to failure of reaching the destination.

Choice of function based on the magnitude of values

In an experiment undertaken to understand the relationship between two variables, multiple observations are noted and model fitting is used to find the best parameters of the model. For this fitting job a distance measure like a square of difference between the corresponding points is used to measure the fitness of a particular set of parameters. A slightly different but analogous situation is a multi-objective case where there are several measurable/observable variables and the model has to be optimized simultaneously on all measures. This array of measures can be converted to the usual case of scalar output by finding appropriate weights. These weights deserve special attention when combining measures with different ranges so that a single measure doesn't dominate the search. So a simple least squares distance between the observed data and simulated data may not work well. A better way to handle the situation is by bringing all objectives into a common range (say [0,1]). This can be achieved by dividing each error by the magnitude of its maximum possible error (or just the order of magnitude like 10, 10^2, 10^3...).

Consider two error/distance measures ERR1: Squared distance = ∑(yest-yobs)^2, ERR2: Fraction change = ∑ abs((yest-yobs)/yobs). In ERR2 the magnitude of numerator is proportional to the ERR1, so the difference between them is the denominator. The inverse "yobs" can be thought as weights. If all "yobs" are in similar range then all error terms are weighted similarly. But if there are some outliers with large "yobs", then the function penalizes them with a lower weight. To demonstrate this I will work out a hypothetical example with these two objective functions.

In the figure below, the red dots are the observations noted from a hypothetical experiment which is capturing the relation between two variables X & Y. Theoretically the relation between them is given by Y=X. The experimental observations are vulnerable to external noises and hence the relation is not easily seen. But a careful inspection highlights the outliers at x=3 and x=9. In the code you can see that I added a random normal noise at all points and and extra noise to create outliers at x=3,6,9.

A difference of squared error is sensitive to outliers in the data, designing an objective function incorporating the peculiarities of the case will be more effective 

Now looking at the average error between the family of lines Y=X+intercept and the observations, the lowest minimum error with ERR2 is at intercept=0 whereas the lowest error with ERR1 is at intercept=2. The outlier at x=3 is pulling up the overall fit, proving that the ERR1 is more sensitive to this outlier compared to ERR2.

In the above example the dominant outlier has higher magnitude compared to other observations, hence the weighting 1/yobs reduces the importance attached to these points. So the same weighting scheme may not work in the case where outlier has lower magnitude compared to the general trend. In such a case 1/yobs will add more importance to outliers while fitting. Hence a different approach like multiplication instead of division with yobs will prove useful.

Conclusion :

In this text I discussed the model calibration practiced in a Mathematical model development. To be effective in this phase requires the modeler to assess several items: Mathematical description of relations in the model, A good search strategy and description of error or fitness measure. Here I presented my case on the importance of the design of objective functions with examples to illustrate the point. 

In reality it may be wise to solve the problem on multiple fronts. In my own task I encountered the case of a optimizing the model for multiple objectives simultaneously. I chose to reduce the multiple objectives to a scalar where using fraction error measure helped me avoid worrying over the weights to use. Then I implemented the global optimization strategy Genetic Algorithm. But for the implementation to be practical I vectorized the operations in the program. Further, trying out a different choices of ode solvers gave me an economical option by bringing down the optimization task from an estimated run time of over 20hr+ to slightly over 1hr.

The R code for the last example is hosted on Github

No comments:

Post a Comment