Data-Driven Discovery of Stochastic Dynamical Systems through Ensemble-SINDy Methods
Summary
Modelling of natural phenomena is crucial for understanding dynamical systems and their behaviour. There exist several methods to systematically identify underlying models. These include Sparse Identification of Nonlinear Dynamics (SINDy), which aims to find the relevant terms for describing a system’s behaviour from a library of candidate terms. SINDy has been extended by using ensembling techniques (E-SINDy): bagging E-SINDy, library E-SINDy (LB-SINDy) and double bagging E-SINDy (DB-SINDy). DB-SINDy refers to a combination of library E-SINDy and bagging E-SINDy. In this thesis, we study the performance and accuracy of these model discovery methods on noisy data. Specifically, we generate synthetic data of the Lotka–Volterra system, allowing comparison of the identified parameters to the simulation parameters. Unlike in prior research, that either used real-world data or synthetic data with measurement noise only, we also add dynamical noise to our data, which influences the dynamics of the system itself. To discover the model from data, we need to tune the hyperparameters of the methods, as the method’s performance strongly depends on them. Therefore, we explore the feasibility of hyperparameter optimisation purely based on data, rather than relying on the true parameter values. We find that, while it presents challenges, this is achievable. Furthermore, we investigate the distribution of the identified parameters within a bagging E-SINDy ensemble. This provides insight into the accuracy of the identified parameters, which is an advantage of the bagging E-SINDy method. Lastly, we examine the performance of SINDy and E-SINDy methods on data with higher levels of measurement noise. While SINDy and bagging E-SINDy struggle to correctly identify the underlying model based on data with high levels of measurement noise, LB-SINDy and DB-SINDy demonstrate to be more robust to noisy data.