Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorSosnovsky, Sergey
dc.contributor.authorPozzi, Lorenzo
dc.date.accessioned2022-10-26T00:00:44Z
dc.date.available2022-10-26T00:00:44Z
dc.date.issued2022
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/43073
dc.description.abstractAutomatic Keyword Extraction involves the identification of representative terms in a passage of interest. There are various applications, including topic modeling, semantic search, information retrieval, and text summarization where a set of key- words is highly effective. Past research showed the potential of Transformer models for the task. However, such architectures are limited by the need for labeled datasets that require time and effort to be annotated. The present research seeks to overcome this limitation by proposing an alternative annotation approach for Automatic Key- word Extraction corpora, generalizable to diverse domains with reduced costs and production time. Then, a model based on Bidirectional Encoder Representations from Transformers (BERT) will be fine-tuned to extract domain-specific terms from the generated dataset. The experiments aim to corroborate the designed annota- tion procedure and shed light on BERT’s capability in recognizing relevant terms for domain-specific documents. This thesis also proposes an analysis of the word space generated by BERT in order to study the effect of fine-tuning on Automatic Keyword Extraction. The results showed that the proposed solution for dataset an- notation was effective and that the implemented BERT-based model outperformed the baselines in all the proposed tasks. Moreover, the final analysis indicates that BERT’s word space follows a semantic coherence since the generated embeddings are arranged based on the relatedness to the target domain.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectThe present research delves into the topic of domain-specific keyword identification. It involves the generation of datasets for the training and evaluation of deep learning models. In particular, BERT was taken into examination and tested to assess its ability to adapt to a domain-oriented task using the generated dataset.
dc.titleKeyed Alike: Towards Versatile Domain-Specific Keyword Extraction with BERT
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordskeyword extraction:deep learning:dataset generation:
dc.subject.courseuuHuman-Computer Interaction
dc.thesis.id11498


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record