Unsupervised drift detection in accounting data
Summary
As machine learning becomes more popular and available to all sectors
ML models have to be maintained and their performance has to be mon-
itored. Large-scale disruptive events such as the COVID-19 pandemic
have a big influence on society and possibly also on the data that reflects
it. As a result, the performance of an ML model might decrease substan-
tially. This change in data is difficult to monitor in the absence of labels. As
this project is in collaboration with the Auditdienst Rijk and labels are not
readily available in their data environment this paper proposes the use of
the STUDD method to detect drift in an unsupervised way. The main hypothesis
is that drift should be detected in new unseen accounting data around
early 2020 when new COVID-19-related policies were implemented and
affected the budget and spending patterns of the Dutch government. Here
we show that the STUDD method successfully detected drift in the new
unseen data early on in the pandemic year. However, this change can not
be attributed to the spread of COVID-19 as policies were implemented a
substantial time after the first drift was detected. This might indicate other
reasons for the changes in the data such as time or some external events
that occurred in the previous year and already induced drift.