Named Entity Recognition and Relation Extraction with a Transformer-Based Model
Summary
One of the tasks of natural language processing (NLP) is information extraction, which aims to transform unstructured text data into structured information. Key components of this task include Named Entity Recognition (NER) and Relation Extraction (RE), which focus on the identification and classification of entities and the relations between them within the text. The areas of application for NER and RE models include constructing knowledge graphs and supporting applications like machine translation and automated question-answering systems. In recent years, deep learning models, including trans- formers, have achieved state-of-the-art performance in NER and RE tasks.
This thesis examines the performance of the transformer-based model LUKE (Language Understanding with Knowledge-based Embeddings) for NER and RE tasks. We bench- mark LUKE against the spaCy solution for NER and the REBEL (Relation Extraction By End-to-End Language Generation) model for RE using a manually labeled dataset of 335 news articles.
Here we show that while LUKE demonstrates strong performance in NER, it is outper- formed by the fine-tuned spaCy model on our specific dataset. For RE tasks, LUKE’s rela- tion classifier shows outstanding performance, but the sequential LUKE-based RE pipeline does not match the performance of the end-to-end REBEL model. These results provide in- sights into the strengths and limitations of LUKE, guiding future efforts in model selection.