Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorBehrisch, Michael
dc.contributor.authorRaaij, Ruben van
dc.date.accessioned2025-08-15T00:03:57Z
dc.date.available2025-08-15T00:03:57Z
dc.date.issued2025
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/49754
dc.description.abstractThis thesis introduces a Collaborative Role-Oriented Workflow for SQL generation (CROW-SQL), which is a modular multi-agent framework designed to improve the reliability, accuracy, and interpretability of Text-to-SQL generation using Large Language Models (LLMs). Rather than relying on a monolithic prompting strategy, CROW-SQL decomposes the Structured Query Language (SQL) generation process into collaborative subtasks, query generation, schema suggestion, refinement, and orchestration, which are handled by independent, specialized agents. All agents are instantiated from the same LLM backend, primarily Gemini 2.0 Flash, ensuring a fair and controlled evaluation of agent behavior. To evaluate the system’s effectiveness, we benchmark CROW-SQL on two datasets: the academic Spider benchmark and the real-world BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation (BIRD) dataset. Experiments vary key parameters such as Query Generation Budget (QGB), few-shot prompt size, and agent composition. Evaluation metrics include execution accuracy, SQL correctness, structure correctness, skeleton similarity, Levenshtein distance, and runtime. A comparative study between Gemini 2.0 Flash and Lightweight variant of OpenAI’s Generative Pre-trained Transformer 4o (GPT4o-mini) highlights Gemini’s better performance in structural alignment and execution robustness within the multi-agent context. The results show that multi-agent configurations significantly outperform single-agent baselines, especially on complex queries. The Refiner Agent plays a critical role in recovering from execution failures. Optimal performance is achieved with a Query Generation Budget of 3, beyond which diminishing returns are observed. The modular architecture also enhances transparency, debugging, and deployability, making CROW-SQL particularly suitable for enterprise and compliance-focused applications. This work contributes a reproducible, tool-augmented framework for agent-based Textto-SQL reasoning, and sets the stage for future research in schema-aware prompting and adaptive agent routing SQL generation.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectMetrics and Benchmarks for Evaluating Dutch Prompt-Based Text-to-SQL Systems
dc.titleMetrics and Benchmarks for Evaluating Dutch Prompt-Based Text-to-SQL Systems
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsLLM, Agents, Text-to-sql
dc.subject.courseuuData Science
dc.thesis.id51687


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record