dc.description.abstract | Automated Program Repair (APR) has emerged as a valuable tool for developers in
the software development and maintenance process. Despite recent advances in deep
learning (DL), the DL-based APR approaches still have limitations. A notable research
gap exists in the current state-of-the-art (APR) methods, as they often require domain specific
knowledge and retraining when transitioning to different programming languages.
This study explores the potential of Large Language Models (LLMs), specifically
ChatGPT, as a promising alternative for patch generation, as they can potentially overcome
these limitations by not requiring domain knowledge and enabling seamless adaptation
across different programming languages. The experiment focuses on exploring the
potential of ChatGPT as a method for generating software patches. Specifically, we investigate
its performance using the benchmark Defects4j v2.0, conducting tests on a total
of 476 bugs. We assume perfect localization of the buggy lines for the purpose of the
experiment.
In our analysis, we compare the results of the ChatGPT-based patch generation with
other state-of-the-art APR methods. Our findings reveal that ChatGPT demonstrates a
comparatively weaker performance in this context. However, despite its current limitations,
our study highlights untapped potential within ChatGPT and other Large Language
Models (LLMs). With ongoing advancements and improvements, it is plausible
that LLMs may surpass existing methods and offer superior performance in the future.
However, LLMs like ChatGPT need further improvements and refinements to fully realize
their potential. | |