Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorAbzianidze, Lasha
dc.contributor.authorGirish Nair, Madhav
dc.date.accessioned2025-08-29T00:03:33Z
dc.date.available2025-08-29T00:03:33Z
dc.date.issued2025
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/50137
dc.description.abstractThe growing emphasis on sustainability reporting and Environmental, Social, and Governance (ESG) scoring has created an urgent need for transparent, automated analysis of dense corporate disclosure documents. Current ESG scoring methodologies suffer from inconsistencies between rating providers, lack of transparency in scoring rationale, and heavy reliance on manual analysis that is both time-consuming and prone to subjectivity. This thesis addresses these challenges by investigating advanced Large Language Model (LLM) based information retrieval frameworks for sustainability reporting analysis, specifically focusing on Retrieval-Augmented Generation (RAG) and Long Context techniques as a foundation for automated disclosure requirement satisfaction assessment. RAG models retrieve relevant information from a knowledge base comprised of document chunks, while long-context models use the entire context window of the LLM to generate responses. This research evaluates five distinct LLM-based frameworks: Enhanced RAG, Vanilla RAG, Long Context, Self-Route, and HyDE. Vanilla RAG serves as the baseline approach, retrieving the most semantically similar document chunks to answer queries. Long Context processing provides entire documents as input without chunking, leveraging the full context window of modern LLMs. The Enhanced RAG framework, the novel contribution of this thesis, combines semantic similarity with keyword-based BM25 search through ensemble retrieval, followed by OP-RAG reranking that organizes retrieved contexts by page proximity to improve information coherence. Self-Route employs adaptive processing, initially attempting RAG-based retrieval but dynamically switching to long-context analysis when retrieved chunks prove insufficient. HyDE (Hypothetical Document Embeddings) enhances retrieval accuracy by first generating hypothetical answers to queries, then using these generated responses for improved similarity matching with document content. The evaluation employs the NEPAQuAD1.0 dataset containing 1,079 question-answer pairs derived from National Environmental Policy Act documents, serving as a proxy for sustainability disclosure analysis tasks. The comprehensive evaluation reveals Enhanced RAG as the optimal framework for sustainability reporting applications, achieving superior performance across all metrics (answer correctness M = 0.756, context recall M = 0.652, faithfulness M = 0.735) while maintaining computational efficiency comparable to Vanilla RAG. Enhanced RAG’s hybrid retrieval approach combining semantic similarity with BM25 keyword matching proves particularly effective for precision-oriented queries typical of ESG disclosure requirements, achieving exceptional performance on closed questions (M = 0.926). Long Context demonstrates competitive answer correctness (M = 0.747) but requires 13-fold longer execution time, while Self-Route achieves balanced performance through adaptive routing. Question type analysis reveals framework-specific strengths: Enhanced RAG excels in factual validation tasks, Long Context performs best for complex reasoning requiring document-wide synthesis, and all frameworks struggle with open-ended divergent questions. Oracle analysis indicates potential 12.2% performance improvement through optimal framework selection, demonstrating significant potential for ensemble approaches in automated ESG scoring systems.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectEvaluating RAG vs. Long-context LLM-based frameworks under the scope of sustainability-related disclosure regulations
dc.titleBeyond Vanilla-RAG: Leveraging Advanced Techniques for Enhanced AI Retrieval
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsRAG; Long Context; LLM; Sustainability Reporting; ESG Score; LLM-as-Judge
dc.subject.courseuuArtificial Intelligence
dc.thesis.id53210


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record