Walking the Tightrope: Balancing Energy Efficiency and Accuract in LLM-Driven Code Generation

Buis, Mats

View/Open

Thesis_Mats_Buis_Final_Version_6205135.pdf (3.657Mb)

Publication date

2025

Author

Buis, Mats

Metadata

Show full item record

Summary

Large Language Models (LLMs) consume significant amounts of energy during inference, espe cially for computationally expensive tasks like code generation, which leads to environmental con cerns. This work aims to reduce the energy consumption during inference without compromising model performance. The energy consumption of Qwen2.5-Coder-7B-Instruct, Meta-LLaMA 3.1 8B-Instruct, and DeepSeekCoder-V2-Instruct-16B was evaluated on BigCodeBench, a benchmark that consists of 1,140 diverse coding tasks, using a software-based energy measuring approach. The relations between task nature, batch size, model size, fine-tuning, Activation-Aware Weight Quan tization (AWQ), and GPTQ with 8-bit and 4-bit precision were investigated for a variety of models including the Qwen2.5 models. Results indicate that task nature significantly affects energy con sumption across all tested models, while batch size has a minor effect. Notably, the Meta-LLaMA model consumed 130.77% more energy than the DeepSeekCoder model while achieving lower ac curacy. Fine-tuning, AWQ, GPTQ-INT8, andGPTQ-INT4quantizations reducing energy consump tion by up to 19%, 67%, 40%and67%,respectively. GPTQ-INT8models achieved these reductions without significantly reduced accuracy, whereas GPTQ-INT4 models showed slight decreases and AWQshowedsubstantially lower pass@1 scores. This work demonstrates that energy consumption of LLMs can effectively be reduced without significant performance loss, which demonstrates the importance and contributions of innovative research for sustainable AI practices.

URI

https://studenttheses.uu.nl/handle/20.500.12932/48350

Collections

Theses