Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorSchraagen, M.P.
dc.contributor.advisorLamprecht, A.L.
dc.contributor.advisorLucas, R.W.
dc.contributor.authorLuijtgaarden, N. van de
dc.date.accessioned2019-09-26T17:00:27Z
dc.date.available2019-09-26T17:00:27Z
dc.date.issued2019
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/34261
dc.description.abstractWith the legal sector embracing digitalization, the increasing availability of information has led to a need for systems that can automatically summarize one or more documents. Current research on legal text summarization has only focused on extractive methods, which can result in awkward summaries as sentences in legal documents can be very long and detailed. In this study, we argue that due to more data being available, improved hardware and matured algorithms, the time is now right for using abstractive summarization models in the legal field. The main goal of this thesis is to discuss how we can best apply an abstractive summarization model on a legal domain dataset. A five-phased approach was used to evaluate generated summaries based on ROUGE score, abstractiveness and through a human evaluation experiment using law graduates. ROUGE results of our experiments are comparable to state-of-the-art studies that made use of the CNN/Daily Mail dataset. Experiments show that the model excels in rewriting the long and redundant legal sentences to much shorter ones, but does not generate many new words compared to the input document. However, the conducted human evaluation showed that not all elements needed in a summary (background, considerations, judgement) were always present together in a generated summary, and that reference summaries got better relevance scores. Still, students observed that generated summaries did contain key information about cases and preferred it to using reference summaries that only contain keywords. Through this study, we argue that there is a lot of potential for abstractive summarization in the legal field. The quality is not on the same level as the reference summaries, but it can function as a good replacement for reference summaries that only contain keywords. For improving relevance in the generated summaries, an implementation of a network that can recognize the three core elements of a case is needed. For readability, additional post-processing in the decoding function can help recognize when sentences are cut off too early. In general, we also doubt whether ROUGE is still a good metric for evaluating abstractive summarization models, as there exists an inverse relationship between the ROUGE score and the abstractiveness of a document.
dc.description.sponsorshipUtrecht University
dc.format.extent1163748
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.titleAutomatic Summarization of Legal Text
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsText summarization, abstractive summarization, legal text, artificial intelligence, neural networks
dc.subject.courseuuBusiness Informatics


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record