Walking the Tightrope: Balancing Energy Efficiency and Accuract in LLM-Driven Code Generation
Summary
Large Language Models (LLMs) consume significant amounts of energy during inference, espe
cially for computationally expensive tasks like code generation, which leads to environmental con
cerns. This work aims to reduce the energy consumption during inference without compromising
model performance. The energy consumption of Qwen2.5-Coder-7B-Instruct, Meta-LLaMA 3.1
8B-Instruct, and DeepSeekCoder-V2-Instruct-16B was evaluated on BigCodeBench, a benchmark
that consists of 1,140 diverse coding tasks, using a software-based energy measuring approach. The
relations between task nature, batch size, model size, fine-tuning, Activation-Aware Weight Quan
tization (AWQ), and GPTQ with 8-bit and 4-bit precision were investigated for a variety of models
including the Qwen2.5 models. Results indicate that task nature significantly affects energy con
sumption across all tested models, while batch size has a minor effect. Notably, the Meta-LLaMA
model consumed 130.77% more energy than the DeepSeekCoder model while achieving lower ac
curacy. Fine-tuning, AWQ, GPTQ-INT8, andGPTQ-INT4quantizations reducing energy consump
tion by up to 19%, 67%, 40%and67%,respectively. GPTQ-INT8models achieved these reductions
without significantly reduced accuracy, whereas GPTQ-INT4 models showed slight decreases and
AWQshowedsubstantially lower pass@1 scores. This work demonstrates that energy consumption
of LLMs can effectively be reduced without significant performance loss, which demonstrates the
importance and contributions of innovative research for sustainable AI practices.