Exploring LLM-Based Semantic Representations in a Hybrid Approach for Automated Trace Link Recovery

Cheng, Junxin

View/Open

master_thesis.pdf (2.235Mb)

Publication date

2025

Author

Cheng, Junxin

Metadata

Show full item record

Summary

Automated trace link recovery between issues and commits is essential for maintaining requirements traceability, as it reduces the manual effort required in large-scale software projects. This study investigates the effectiveness of large language models (LLMs) in generating semantic representations within a machine learning classification framework for supporting automated trace link recovery. To this end, we formulate three research questions: (1) How effective is the feature representation via LLM embeddings compared to information retrieval (IR) methods, static word embeddings, and bidirectional encoder representations from Transformers (BERT)-based models? (2) What is the relative contribution of textual and non-textual features to supervised issue-commit link classification? (3) Which classification algorithm performs best when using the engineered features? We extract and construct three categories of feature sets: textual, non-textual, and a combination of both, based on data from eight open-source projects. And we apply five models: VSM with TF-IDF, FastText, Word2Vec, Sentence Transformer, and OpenAl's embedding model to evaluate the effectiveness of semantic representations. These models are assessed using two classifiers (Random Forest and XGBoost) in two practical scenarios: trace recommendation and trace maintenance. Evaluation metrics include Precision, Recall, F2, and F0.5 scores, further supported by statistical significance tests and feature importance analysis. The results show that textual features generated by the VSM with TF-IDF consistently outperform other semantic and non-textual features, demonstrating not only the effectiveness of domain-specific term distribution captured by traditional IR methods but also the importance of high-quality semantic representations. Nonetheless, LLM-based models, without domain-specific fine-tuning, demonstrate comparable performance, suggesting their strong potential in automated trace link recovery. Additionally, Random Forest outperforms XGBoost in both evaluation scenarios. This comparative study provides practical insights into designing robust LLM-enhanced traceability support systems for requirements engineering in modern software development environments. We introduce a hybrid approach that integrates traditional IR models, static and contextual embeddings (including LLM-based representations), along with both textual and non-textual features within a supervised classification framework. Future work may focus on fine-tuning LLMs for domain-specific contexts, enriching the feature space with additional development artifacts, and exploring prompt-based or interactive trace inference. Investigating lightweight deployment strategies and alternative classifiers also presents promising directions for practical and scalable use.

URI

https://studenttheses.uu.nl/handle/20.500.12932/50513

Collections

Theses