View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Named Entity Recognition and Relation Extraction with a Transformer-Based Model

        Thumbnail
        View/Open
        ADS_thesis_Natalia_Kurdanova.pdf (860.5Kb)
        Publication date
        2024
        Author
        Kurdanova, Natalia
        Metadata
        Show full item record
        Summary
        One of the tasks of natural language processing (NLP) is information extraction, which aims to transform unstructured text data into structured information. Key components of this task include Named Entity Recognition (NER) and Relation Extraction (RE), which focus on the identification and classification of entities and the relations between them within the text. The areas of application for NER and RE models include constructing knowledge graphs and supporting applications like machine translation and automated question-answering systems. In recent years, deep learning models, including trans- formers, have achieved state-of-the-art performance in NER and RE tasks. This thesis examines the performance of the transformer-based model LUKE (Language Understanding with Knowledge-based Embeddings) for NER and RE tasks. We bench- mark LUKE against the spaCy solution for NER and the REBEL (Relation Extraction By End-to-End Language Generation) model for RE using a manually labeled dataset of 335 news articles. Here we show that while LUKE demonstrates strong performance in NER, it is outper- formed by the fine-tuned spaCy model on our specific dataset. For RE tasks, LUKE’s rela- tion classifier shows outstanding performance, but the sequential LUKE-based RE pipeline does not match the performance of the end-to-end REBEL model. These results provide in- sights into the strengths and limitations of LUKE, guiding future efforts in model selection.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/46873
        Collections
        • Theses
        Utrecht university logo