Detecting Prediction Influence Drift In Data Streams

Jelsma, T.

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Krempl, G.M.
dc.contributor.author	Jelsma, T.
dc.date.accessioned	2021-07-27T18:00:58Z
dc.date.available	2021-07-27T18:00:58Z
dc.date.issued	2021
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/40050
dc.description.abstract	Today, many machine learning models are being deployed to make decisions. One of the assumptions of the models is that the data are stationary over time. If the data change over time due to a dynamic environment, this is called concept drift. However, the predictions of the model could also influence future data. For example, if a model predicts that stock prices will increase, more stocks are bought, and the stock prices will indeed move up. The opposite is also possible: If malware is detected, malware is redesigned in such a way that it will not be detected as malware in the future. These mechanisms are called a self-fulfilling and self-defeating prophecy. This thesis aims to distinguish between prediction influence drift and intrinsic drift. First, a method is developed in which a \textit{feedback loop} exists between the data and the predictions of a machine learning model. Next, a prediction influence detection approach is created that calculates the density trajectories of data before and after classification. Linear regression models are fitted on the trajectories before classification. If the predictions differ from the density trajectories after classification, this indicates the start of drift after classification. If prediction influence exists, the differences between the predictions and trajectories within a true class are expected to be different. The detection approach is evaluated on synthetic data with and without prediction influence, and with and without intrinsic drift. Density trajectories on real-world data are presented and discussed. The data generator shows promising results because the performance of a model with prediction influence is correctly influenced. The detection approach does not show clear results. The density trajectories are different per influence, but the differences between predicted and calculated densities cannot be distinguished. The trajectories on real-world data are interesting, but due to the lack of true labels, the approach cannot detect any prediction influence. A first step in detecting prediction influence is taken, but the current approach does not work. The results show a clear influence on the performance of the machine learning models. Both the prediction influence data generator and the density trajectories look promising and could be used in new approaches.
dc.description.sponsorship	Utrecht University
dc.format.extent	3860379
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.title	Detecting Prediction Influence Drift In Data Streams
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	Prediction Influence, Concept Drift, Self-Fulfilling Influence, Self-Defeating Influence, Data Generation
dc.subject.courseuu	Business Informatics

Files in this item

Name:: Tineke_Jelsma_Masters_Thesis.pdf
Size:: 3.681Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record