FIONA - A Categorical Outlier Detector Framework
Summary
FIONA (FInding Outliers iN Attributes) is a novel framework designed for detecting outliers in categorical data. Outliers, often indicative of errors or anomalous observations, can have a significant impact on data analysis and decision-making processes.
In the case of categorical attributes, the task of detecting outliers necessitates the definition of a similarity metric between different values, a task more intricate than with numerical attributes. FIONA aims to address this challenge by focusing on the syntactic structures of attribute values, providing a powerful tool for identifying unusual patterns within datasets.
The framework operates in an unsupervised manner, eliminating the need for training examples. It leverages syntactic transformations, such as regular expressions and generalizations, to capture and analyze the structural characteristics of categorical values.
By constructing a tree-like structure and applying a custom scoring function, FIONA systematically compares and evaluates the similarity of attribute values.
The evaluation of FIONA on various datasets, demonstrates its effectiveness in outlier detection. While some false positives are identified, further analysis reveals interesting insights and highlights the importance of considering semantic context alongside the
syntactic structures. FIONA’s scalability allows it to handle large datasets efficiently in contrast to conventional baseline methods, making it a valuable tool for outlier detection in various real-world applications.