Code Generation on a Diet: A Comparative Evaluation of Low-Parameter Large Language Models

Carboni, Leonardo

View/Open

Master_Thesis.pdf (1.905Mb)

Publication date

2024

Author

Carboni, Leonardo

Metadata

Show full item record

Summary

In the constantly evolving field of software development, the demand for automated code generation has significantly increased since the release of AI based tools like ChatGPT and GitHub Copilot. These tools, powered by Large Language Models (LLMs), typically require server requests due to their closed source nature and substantial computational costs. This thesis investigates the potential of smaller, locally runnable low-parameter LLMs in the context of code generation. The research begins with an overview of the state of the art of coding and its anticipated evolution thanks to AI integration. It continues by analyzing the current landscape of LLMs by explaining the underlying mechanisms of these models and listing several of the most important low-parameter models, such as Mistral, CodeLlama and DeepSeek-Coder. The study also examines the impact of techniques like fine-tuning, instruction-tuning and quantization on improving performance and efficiency. Additionally, it reviews the available code evaluation techniques, focusing on match-based and functional metrics, and discusses the datasets used to evaluate the models. The methodology employed involves selecting suitable datasets and models, generating code samples, and evaluating them using both types of metrics. The evaluation highlights the limitations of match based metrics in capturing the LLM’s true code generation performance and emphasizes the importance of functional metrics like pass rates. The findings indicate that while larger models generally outperform smaller ones, the performance gap is narrowing, thanks to higher quality and more domain-specific training data. Moreover the study confirms the effectiveness of the aforementioned fine tuning and quantization techniques in improving the model’s capabilities and lowering the requirements needed to run them. The thesis concludes by suggesting that with continuous advancements, smaller models could play a crucial role in making high quality code generation more accessible and sustainable.

URI

https://studenttheses.uu.nl/handle/20.500.12932/46906

Collections

Theses