Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorDalpiaz, Fabiano
dc.contributor.authorChou, Cheng Yi
dc.date.accessioned2025-08-28T00:01:30Z
dc.date.available2025-08-28T00:01:30Z
dc.date.issued2025
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/50024
dc.description.abstractModeling is an important activity in Requirements Engineering (RE), yet performing it manually is time-consuming and tedious. When the modeling starts from an existing set of early requirements, researchers have proposed tools that automatically extract domain models. Early rule-based systems eased the burden but relied on experts to create linguistic rules to process the natural language, which struggles with free-form text and typos. Recent studies show that large language models (LLMs) can automate the modeling task, potentially replacing the heuristic rules; however, their resourceintensive nature limits local deployment. This study therefore asks whether cost-efficient small language models (SLMs) can deliver performance comparable to that of LLMs and a rulebased system, and further explores the types of errors they tend to produce. We evaluate GPT-o1, Llama3-8B, Qwen-14B, and the rule-based system Visual Narrator (VN) on the task of domain-model extraction, measuring model completeness (F2) and validity (F0.5). Experiments use a dataset of nine projects consisting of 487 user stories, and apply Friedman and Nemenyi tests to verify statistical significance. The results show that neither large nor small language models achieve a statistically significant improvement over VN in class identification. GPT-o1 outperforms the two SLMs in model validity on both class- and associationidentification tasks, whereas Qwen-14B attains model completeness comparable to GPT-o1 on both tasks despite its smaller size. Error analysis reveals distinct profiles: VN tends to mislabel Role entities, whereas language models more frequently introduce Redundant/Derived elements, with SLMs additionally prone to Irrelevant and Attribute errors. These findings indicate that resource-efficient SLMs can rival LLMs in model completeness, shifting the focus from model scale to accessibility. We also introduce an open-source evaluation system that supports reproducible research in automated requirements modeling.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectThis study aims to investigate the performance of one leading large language model (GPT-o1) and two popular small language models (Llama3-8B and Qwen-14B) in the task of domain modeling. It also provides prompt and and automatic evaluation system for furture researchers.
dc.titleVisual Narrator 2.0: From User Stories to Domain Models via Large and Small Language Models
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsRequirements Engineering; NLP; LLMs; SLMs; Domain Modeling
dc.subject.courseuuBusiness Informatics
dc.thesis.id52698


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record