View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Detecting Prediction Influence Drift In Data Streams

        Thumbnail
        View/Open
        Tineke_Jelsma_Masters_Thesis.pdf (3.681Mb)
        Publication date
        2021
        Author
        Jelsma, T.
        Metadata
        Show full item record
        Summary
        Today, many machine learning models are being deployed to make decisions. One of the assumptions of the models is that the data are stationary over time. If the data change over time due to a dynamic environment, this is called concept drift. However, the predictions of the model could also influence future data. For example, if a model predicts that stock prices will increase, more stocks are bought, and the stock prices will indeed move up. The opposite is also possible: If malware is detected, malware is redesigned in such a way that it will not be detected as malware in the future. These mechanisms are called a self-fulfilling and self-defeating prophecy. This thesis aims to distinguish between prediction influence drift and intrinsic drift. First, a method is developed in which a \textit{feedback loop} exists between the data and the predictions of a machine learning model. Next, a prediction influence detection approach is created that calculates the density trajectories of data before and after classification. Linear regression models are fitted on the trajectories before classification. If the predictions differ from the density trajectories after classification, this indicates the start of drift after classification. If prediction influence exists, the differences between the predictions and trajectories within a true class are expected to be different. The detection approach is evaluated on synthetic data with and without prediction influence, and with and without intrinsic drift. Density trajectories on real-world data are presented and discussed. The data generator shows promising results because the performance of a model with prediction influence is correctly influenced. The detection approach does not show clear results. The density trajectories are different per influence, but the differences between predicted and calculated densities cannot be distinguished. The trajectories on real-world data are interesting, but due to the lack of true labels, the approach cannot detect any prediction influence. A first step in detecting prediction influence is taken, but the current approach does not work. The results show a clear influence on the performance of the machine learning models. Both the prediction influence data generator and the density trajectories look promising and could be used in new approaches.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/40050
        Collections
        • Theses
        Utrecht university logo