Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorKrempl, G.M.
dc.contributor.authorJelsma, T.
dc.date.accessioned2021-07-27T18:00:58Z
dc.date.available2021-07-27T18:00:58Z
dc.date.issued2021
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/40050
dc.description.abstractToday, many machine learning models are being deployed to make decisions. One of the assumptions of the models is that the data are stationary over time. If the data change over time due to a dynamic environment, this is called concept drift. However, the predictions of the model could also influence future data. For example, if a model predicts that stock prices will increase, more stocks are bought, and the stock prices will indeed move up. The opposite is also possible: If malware is detected, malware is redesigned in such a way that it will not be detected as malware in the future. These mechanisms are called a self-fulfilling and self-defeating prophecy. This thesis aims to distinguish between prediction influence drift and intrinsic drift. First, a method is developed in which a \textit{feedback loop} exists between the data and the predictions of a machine learning model. Next, a prediction influence detection approach is created that calculates the density trajectories of data before and after classification. Linear regression models are fitted on the trajectories before classification. If the predictions differ from the density trajectories after classification, this indicates the start of drift after classification. If prediction influence exists, the differences between the predictions and trajectories within a true class are expected to be different. The detection approach is evaluated on synthetic data with and without prediction influence, and with and without intrinsic drift. Density trajectories on real-world data are presented and discussed. The data generator shows promising results because the performance of a model with prediction influence is correctly influenced. The detection approach does not show clear results. The density trajectories are different per influence, but the differences between predicted and calculated densities cannot be distinguished. The trajectories on real-world data are interesting, but due to the lack of true labels, the approach cannot detect any prediction influence. A first step in detecting prediction influence is taken, but the current approach does not work. The results show a clear influence on the performance of the machine learning models. Both the prediction influence data generator and the density trajectories look promising and could be used in new approaches.
dc.description.sponsorshipUtrecht University
dc.format.extent3860379
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.titleDetecting Prediction Influence Drift In Data Streams
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsPrediction Influence, Concept Drift, Self-Fulfilling Influence, Self-Defeating Influence, Data Generation
dc.subject.courseuuBusiness Informatics


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record