Conceptual overview of library¶

ODE’s and their time evolution¶

To keep it as general as possible we do not assume anything about the dependency of the model on the optimized input parameters. Therefore, we are required to study their effect on the model by examining the results of the integrated ODE’s.

We assume a first-order coupled ODE system of the following form:

$\large \partial_{t} u^{i} = \alpha^{i}_{j} u^{j}$

Where alpha is the set of real valued coefficients. A simple example would be something of the form:

$\large \partial_{t} u^{1} = u^{2}$

$\large \partial_{t} u^{2} = u^{1}$

Here, alpha takes the form:

$\large \alpha^{i}_{j} = \left\lbrace \begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix} \right\rbrace$

Fitting¶

Stability and Suitability¶

We require that the desired state of the ODE system is a steady state of the system. From an optimization point of view,we usually also desire a certain steady state of the system. This gives us the following two constrains of stability and suitability:

$\large \text{Stability} \; \; \exists \; i : j>i \; || G_{j}(u_{0},\alpha) - G_{i}(u_{0},\alpha) || < \epsilon_{1}$

$\large \text{Suitability} \; \; \exists \; x : || G(u_{0},\alpha) - y || < \epsilon_{2}$

Where u_{0} is the vector containing the quanteties observed in the ODE in their initial state (t=0). alpha is a matrix containg the coupling coefficients of them. G(u,alpha) is the map of time evolution of the system defined by u, alpha. The subscripts (i and j) represent the iteration step of the time evolution and both epsilon are small (compared to F(x)) positive real valued objects. The desired state of the system is represented by the vector y. The subscripts (i and j) represent the iteration step of the time evolution and both epsilon are two small (compared to G) positive real valued objects.

In general, the parameters we might desire to be optimised are elements of u and alpha. In the following we will refer to them as free parameters (x). As stated above they are

$\large x \subset (u_{0} \cup \alpha)$

All elements of u_{0} and alpha are constant in the optimization process. Therefore, we will in the following only discuss the behaviour of the time evolution dependant on the free parameters F(x). F(x) is a different representation of the same map. Hence,

$\large F(x) = G(u_{0},\alpha)$

Cost/Loss function¶

As implied by the suitability condition, we can define a measure for how close the system represents the desired system. Keep in mind that we only consider steady state outputs of the integrated ODE F(x). This measure is usually referred to as ‘objective function’, but often also called cost- or loss function. Hence, for every set of free parameter (x) we can calculate a cost function J(x). The standard objective function is of the form

$\large J(x,y) = \frac{1}{N_{dim}} \left\lbrace F(x)-y)^{2} \right\rbrace$

The effect of the different free parameters can be examined through a perturbation approach. The free parameters are perturbed slightly.

$\large x_{perturbed} = x + \rho \;\; \text{with}\;\; \rho \ll x$

Then we recalculated the time evolution of the ODE system and calculate the new cost function.

Theoretically, we could continue to sample the parameter space of the free parameters in this way or by testing a homogeneous distribution of free parameters over the sample space. However, this is computationally costly because it requires the evaluation of a large set of free parameters. Especially for a high dimensional search space, meaning a large set of observed quantities modelled by the ODE and/or a large set of free parameters, this becomes unfeasible.

Therefore we require a better sampling approach to reduce the computational cost of finding an optimal set of free parameters.

Gradient Descent¶

The approach chosen in this library is called ‘Gradient Descent’. Its core idea is that we calculate a local gradient of the unknown cost function field and always move along the direction of the steepest descent. This way we find in principle the closest local minima without sampling the whole region but only one path. This drastically cuts the computational cost.

In its simplest form such a gradient descent is given by

$\large x^{n+1} = x^{n} - \lambda \nabla J(x^{n})$

where x^{n+1} is the next tested free parameter set and lambda is the step size, sometimes also called learning rate, of the gradient of J. The gradient always points into direction of the steepest ascent. Hence, by going into the opposite direction we are always walking downhill on the steepest path.

This approach is also used in many machine learning problems and therefore well studied. Its great advantage is that we only require the field of the cost function J(x,y) to be continuous and differentiable. This enables this approach to be applicable in many different problems. However, the generality of this approach introduces certain problems as well. Firstly, it is generally unknown if an ideal solution exists. Secondly, assuming such an ideal solution exists, it is not guaranteed that the algorithm finds it. Hence, if the best output of the algorithm is still not satisfactory, one does not know if the studied system does not have a better solution or if some parameter of the algorithm was badly chosen.