dc.description.abstract | Modeling is an important activity in Requirements Engineering (RE), yet performing it manually is time-consuming and tedious. When the modeling starts from an existing set of early requirements, researchers have proposed tools that automatically extract domain models. Early rule-based systems eased the burden but relied on experts to create linguistic rules to process the natural language, which struggles with free-form text and typos.
Recent studies show that large language models (LLMs) can automate the modeling task, potentially replacing the heuristic rules; however, their resourceintensive nature limits local deployment. This study therefore asks whether cost-efficient small language models (SLMs) can deliver performance comparable to that of LLMs and a rulebased system, and further explores the types of errors they tend to produce. We evaluate GPT-o1, Llama3-8B, Qwen-14B, and the rule-based system Visual Narrator (VN) on the task of domain-model extraction, measuring model completeness (F2) and validity (F0.5). Experiments use a dataset of nine projects consisting of 487 user stories, and apply Friedman and Nemenyi tests to verify statistical significance.
The results show that neither large nor small language models achieve a statistically significant improvement over VN in class identification. GPT-o1 outperforms the two SLMs in model validity on both class- and associationidentification tasks, whereas Qwen-14B attains model completeness comparable to GPT-o1 on both tasks despite its smaller size. Error analysis reveals distinct profiles: VN tends to mislabel Role entities, whereas language models more frequently introduce Redundant/Derived elements, with SLMs additionally prone to Irrelevant and Attribute errors.
These findings indicate that resource-efficient SLMs can rival LLMs in model completeness, shifting the focus from model scale to accessibility. We also introduce an open-source evaluation system that supports reproducible research in automated requirements modeling. | |