Code Generation on a Diet: A Comparative Evaluation of Low-Parameter Large Language Models
Summary
In the constantly evolving field of software development, the demand for
automated code generation has significantly increased since the release of
AI based tools like ChatGPT and GitHub Copilot. These tools, powered by
Large Language Models (LLMs), typically require server requests due to
their closed source nature and substantial computational costs. This thesis
investigates the potential of smaller, locally runnable low-parameter LLMs
in the context of code generation. The research begins with an overview of
the state of the art of coding and its anticipated evolution thanks to AI
integration. It continues by analyzing the current landscape of LLMs by
explaining the underlying mechanisms of these models and listing several
of the most important low-parameter models, such as Mistral, CodeLlama
and DeepSeek-Coder. The study also examines the impact of techniques
like fine-tuning, instruction-tuning and quantization on improving
performance and efficiency. Additionally, it reviews the available code
evaluation techniques, focusing on match-based and functional metrics,
and discusses the datasets used to evaluate the models. The methodology
employed involves selecting suitable datasets and models, generating code
samples, and evaluating them using both types of metrics. The evaluation
highlights the limitations of match based metrics in capturing the LLM’s
true code generation performance and emphasizes the importance of
functional metrics like pass rates. The findings indicate that while larger
models generally outperform smaller ones, the performance gap is
narrowing, thanks to higher quality and more domain-specific training
data. Moreover the study confirms the effectiveness of the aforementioned
fine tuning and quantization techniques in improving the model’s
capabilities and lowering the requirements needed to run them. The thesis
concludes by suggesting that with continuous advancements, smaller
models could play a crucial role in making high quality code generation
more accessible and sustainable.