View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        FIONA - A Categorical Outlier Detector Framework

        Thumbnail
        View/Open
        MSc_Thesis_Thanos-23.pdf (1.039Mb)
        Publication date
        2024
        Author
        Tsiamis, Thanos
        Metadata
        Show full item record
        Summary
        FIONA (FInding Outliers iN Attributes) is a novel framework designed for detecting outliers in categorical data. Outliers, often indicative of errors or anomalous observations, can have a significant impact on data analysis and decision-making processes. In the case of categorical attributes, the task of detecting outliers necessitates the definition of a similarity metric between different values, a task more intricate than with numerical attributes. FIONA aims to address this challenge by focusing on the syntactic structures of attribute values, providing a powerful tool for identifying unusual patterns within datasets. The framework operates in an unsupervised manner, eliminating the need for training examples. It leverages syntactic transformations, such as regular expressions and generalizations, to capture and analyze the structural characteristics of categorical values. By constructing a tree-like structure and applying a custom scoring function, FIONA systematically compares and evaluates the similarity of attribute values. The evaluation of FIONA on various datasets, demonstrates its effectiveness in outlier detection. While some false positives are identified, further analysis reveals interesting insights and highlights the importance of considering semantic context alongside the syntactic structures. FIONA’s scalability allows it to handle large datasets efficiently in contrast to conventional baseline methods, making it a valuable tool for outlier detection in various real-world applications.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/47516
        Collections
        • Theses
        Utrecht university logo