View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Measuring the Behavioral Quality of Log Sampling

        Thumbnail
        View/Open
        Scriptie-met-bijlages.zip (2.031Mb)
        Publication date
        2019
        Author
        Knols, B.
        Metadata
        Show full item record
        Summary
        Process mining is a discipline that uses techniques from data mining and process analysis. It has three main goals that are realised by analysing event logs: process discovery, conformance checking and process enhancement. Especially in process discovery the size of these event logs is becoming a problem given the current tools available in process mining. This is problematic given the exploratory nature of many process mining algorithms, which in many cases are run several times with different parameters. A solution would be reducing the data by sampling the event log. Yet although many sampling approaches exist, the quality of these approaches is unknown. This thesis studies the quality of random samples from an event log by introducing six quality measures based on the behavior of the event log. The approach is backed by theory and has been implemented in the tool ProM. Experiments show that sampling very quickly introduces under and oversampled behavior in the event log, for both high and low sample rates. Unsampled behavior, where behavior is completely absent from a sample, also occurs in all samples, which can can be problematic for frequency-based algorithms. Future research should thoroughly study the sampling of event logs to help practitioners choosing sample rates and techniques.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/33544
        Collections
        • Theses
        Utrecht university logo