Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorGatt, A.
dc.contributor.authorSie, Mika
dc.date.accessioned2024-12-31T00:01:48Z
dc.date.available2024-12-31T00:01:48Z
dc.date.issued2024
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/48282
dc.description.abstractAutomatic Text Summarization (ATS) is the process of automatically summarizing a text. Advancements in neural models in Natural Language Processing (NLP) have significantly enhanced summarization capabilities, making it a crucial tool for processing extensive regulatory documents. Long regulatory texts are challenging to summarize due to their length and complexity. To address this, a two- and multi-step extractive-abstractive architecture is proposed to handle lengthy regulatory documents more effectively. This research shows that the effectiveness of a two-step architecture for summarizing long regulatory texts varies significantly depending on the model used. Specifically, the two-step architecture improves the performance of decoder-only models. For abstractive encoder-decoder models with short context lengths, the effectiveness of an extractive step varies, whereas for long context encoder-decoder models, the extractive step worsens their performance. This research also highlights the challenges of evaluating generated texts, as evidenced by the differing results from human and automated evaluations. Most notably, human evaluations favoured legal language models, while automated metrics preferred general language models. The results underscore the importance of selecting the appropriate summarization strategy based on model architecture and context length. Broadly, this research contributes to the development of more efficient and accurate tools for summarizing complex regulatory documents, enhancing accessibility, and aiding in compliance and decision-making processes in dynamic industries such as the green molecule sector.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectAutomatic Text Summarization (ATS) is the process of automatically summarizing a text. Advancements in neural models in Natural Language Processing (NLP) have significantly enhanced summarization capabilities, making it a crucial tool for processing extensive regulatory documents. Long regulatory texts are challenging to summarize due to their length and complexity. To address this, a two- and multi-step extractive-abstractive architecture is proposed to handle lengthy regulatory documents more
dc.titleStep up your game: A research on two/multi-step summarisation of long, regulatory documents
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsAutomatic Text Summarization; Extractive Summarization; Abstractive Summarization; Natural Lanaguage Processing; Two-step; Multi-step
dc.subject.courseuuArtificial Intelligence
dc.thesis.id34820


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record