Beyond Vanilla-RAG: Leveraging Advanced Techniques for Enhanced AI Retrieval
Summary
The growing emphasis on sustainability reporting and Environmental, Social, and Governance (ESG) scoring has created an urgent need for transparent, automated analysis of dense corporate disclosure documents. Current ESG scoring methodologies suffer from inconsistencies between rating providers, lack of transparency in scoring rationale, and heavy reliance on manual analysis that is both time-consuming and prone to subjectivity. This thesis addresses these challenges by investigating advanced Large Language Model (LLM) based information retrieval frameworks for sustainability reporting analysis, specifically focusing on Retrieval-Augmented Generation (RAG) and Long Context techniques as a foundation for automated disclosure requirement satisfaction assessment. RAG models retrieve relevant information from a knowledge base comprised of document chunks, while long-context models use the entire context window of the LLM to generate responses.
This research evaluates five distinct LLM-based frameworks: Enhanced RAG, Vanilla RAG, Long Context, Self-Route, and HyDE. Vanilla RAG serves as the baseline approach, retrieving the most semantically similar document chunks to answer queries. Long Context processing provides entire documents as input without chunking, leveraging the full context window of modern LLMs. The Enhanced RAG framework, the novel contribution of this thesis, combines semantic similarity with keyword-based BM25 search through ensemble retrieval, followed by OP-RAG reranking that organizes retrieved contexts by page proximity to improve information coherence. Self-Route employs adaptive processing, initially attempting RAG-based retrieval but dynamically switching to long-context analysis when retrieved chunks prove insufficient. HyDE (Hypothetical Document Embeddings) enhances retrieval accuracy by first generating hypothetical answers to queries, then using these generated responses for improved similarity matching with document content. The evaluation employs the NEPAQuAD1.0 dataset containing 1,079 question-answer pairs derived from National Environmental Policy Act documents, serving as a proxy for sustainability disclosure analysis tasks.
The comprehensive evaluation reveals Enhanced RAG as the optimal framework for sustainability reporting applications, achieving superior performance across all metrics (answer correctness M = 0.756, context recall M = 0.652, faithfulness M = 0.735) while maintaining computational efficiency comparable to Vanilla RAG. Enhanced RAG’s hybrid retrieval approach combining semantic similarity with BM25 keyword matching proves particularly effective for precision-oriented queries typical of ESG disclosure requirements, achieving exceptional performance on closed questions (M = 0.926). Long Context demonstrates competitive answer correctness (M = 0.747) but requires 13-fold longer execution time, while Self-Route achieves balanced performance through adaptive routing. Question type analysis reveals framework-specific strengths: Enhanced RAG excels in factual validation tasks, Long Context performs best for complex reasoning requiring document-wide synthesis, and all frameworks struggle with open-ended divergent questions. Oracle analysis indicates potential 12.2% performance improvement through optimal framework selection, demonstrating significant potential for ensemble approaches in automated ESG scoring systems.