Sensitivity Analysis Based Feature-Guided Evolution for Symbolic Regression
Summary
The problem of Symbolic Regression (SR) is to find a mathematical expression which best models a given dataset. Research into SR primarily takes place in Genetic Programming (GP), with the evolutionary algorithm called Standard GP (SGP) at its basis. In this work, we set out to improve upon SGP, using the Sensitivity-based Genetic Programming (SensGP) algorithm.
A thorough examination of SR literature in the field of GP led to the conclusion that algorithms which are improvements of SGP frequently enhance the search process. This is accomplished by guiding the evolution by using additional information about parts of solutions, called features. By conducting comparison experiments between algorithms from the literature, we confirmed this conclusion. As a result, the feature-guided evolutionary process was chosen to be the basis for SensGP.
At the core of the SensGP algorithm lies the use of sensitivity analysis to measure feature importance. The Mean Squared Error (MSE) of a feature is measured on the original data and on the data for which selected variables have had their values shuffled. The difference in these MSE values is used to calculate a feature importance score, which is later used to reintroduce these features into the population.
In addition to the basic SensGP algorithm, we experimented with two variants of SensGP: a Model Dependent approach, in which only the models with the lowest MSE are used for feature importance calculation, and a Variable Importance approach, in which feature importance is measured by the sensitivity of the model they appear in to the variables contained in the feature.
These algorithms are compared to SGP in a number of configurations. Although these experiments did not result in a statistically significant difference in model quality between SGP and SensGP, we present a number of ways in which SensGP might be further refined. Further research will have to establish if these adjustments can make SensGP a useful addition to the variety of SR algorithms in the field of GP.