Visual Narrator 2.0: From User Stories to Domain Models via Large and Small Language Models

Chou, Cheng Yi

View/Open

Rex__Master_Thesis.pdf (1.130Mb)

Publication date

2025

Author

Chou, Cheng Yi

Metadata

Show full item record

Summary

Modeling is an important activity in Requirements Engineering (RE), yet performing it manually is time-consuming and tedious. When the modeling starts from an existing set of early requirements, researchers have proposed tools that automatically extract domain models. Early rule-based systems eased the burden but relied on experts to create linguistic rules to process the natural language, which struggles with free-form text and typos. Recent studies show that large language models (LLMs) can automate the modeling task, potentially replacing the heuristic rules; however, their resourceintensive nature limits local deployment. This study therefore asks whether cost-efficient small language models (SLMs) can deliver performance comparable to that of LLMs and a rulebased system, and further explores the types of errors they tend to produce. We evaluate GPT-o1, Llama3-8B, Qwen-14B, and the rule-based system Visual Narrator (VN) on the task of domain-model extraction, measuring model completeness (F2) and validity (F0.5). Experiments use a dataset of nine projects consisting of 487 user stories, and apply Friedman and Nemenyi tests to verify statistical significance. The results show that neither large nor small language models achieve a statistically significant improvement over VN in class identification. GPT-o1 outperforms the two SLMs in model validity on both class- and associationidentification tasks, whereas Qwen-14B attains model completeness comparable to GPT-o1 on both tasks despite its smaller size. Error analysis reveals distinct profiles: VN tends to mislabel Role entities, whereas language models more frequently introduce Redundant/Derived elements, with SLMs additionally prone to Irrelevant and Attribute errors. These findings indicate that resource-efficient SLMs can rival LLMs in model completeness, shifting the focus from model scale to accessibility. We also introduce an open-source evaluation system that supports reproducible research in automated requirements modeling.

URI

https://studenttheses.uu.nl/handle/20.500.12932/50024

Collections

Theses